I'm surprised (or overlooked) there wasn't a check for clustering of data. One set should clump close (with 24/7 consistency) to normal indoor temperature, while another should clump (with greater spread and time-of-day variation) around actual outdoor temperature. Split the discernibly HVAC-skewed values out, and you'll get a better grasp on the actual outdoor temperature. You could then sub-split the latter into warm-body proximity clustering (phones in pocket vs. otherwise).