All you really need to do is create a range of error possible at each stage of the procedure. With respect to the initial measurements you have, most importantly, the issue of which gauge was used. Then you have the range of potential temperatures that the locker room may have been at when the balls were gauged by Anderson. Then you have the fact that his memory was imprecise as to the exact pressures of the balls, which I would say could easily have varied by 0.2 psi, without him having taking any specific note of it. Perhaps more. Between those three factors, I think you have a potential variation of up to 1 psi, given that the temperature could reasonably have been as low as 68 degrees and as high as, say, 76 degrees, without anyone having paid much attention or taken note.
Then you have the Patriots' 12 balls, which are unlikely to have been used to the same degree during the end of the first half. Some used more and wetter (and perhaps colder), the other remaining in the bag, remaining dry. No observations were made in this respect, but it would be very surprising if the balls were used to a uniform degree.
Then you have the inherent variability between different footballs, as highlighted by Belichick in his press conference on this subject, where he took a bunch of footballs and subjected them to essentially the same treatment and got significant variation. This has been recently reinforced by one of the officials commenting on the potential for slow leaks from footballs, in his experience, and his observation on the variability of footballs.
Then you have the intercepted football, measured three times by, I think, the same gauge, and getting three results, where the largest difference between the three measurements is greater than the problem that the Wells Report found with the Patriots' footballs. This would indicate either a) a lack of repeatability in the gauge used; b) a change in the conditions of the measurement (temperature, moisture); or c) the competence, or lack thereof, of the tester, and/or his memory. I think the lack of repeatability in the gauge is the most likely culprit. It's not like these are lab-grade pieces of equipment, as indicated by the more-or-less repeatable variation between the two used at half-time of 0.4 psi.
With all that potential for error in the measurement of pressure change, it would have been much more surprising if the footballs had followed a predictable rise in pressure with time. The entire "experiment" is just too filled with the potential for error. Of course, the Wells Report solution was to discard the entire timing factor as irrelevant, which is one of their stupidest conclusions, as pointed out by Brady's statistical expert.
An atmospheric physicist, or high school science teacher, with a background in experimental science could have pointed out these areas for potential error and imprecision, and perhaps many more. I would expect such a person to conclude that the only kind of difference between the pressure change in two sets of footballs detectable by this procedure would have had to be on the order of 2.5 to 3.0 psi. Perhaps more. Given the reality of conditions on a rainy football field, and the people doing the testing, it would be difficult to devise an experimental protocol that WOULD reveal small differences between two groups of footballs.
It's disappointing that the NFL didn't just say, "We can't tell what happened here. We think the Patriots' footballs decreased in pressure more than the Colts' footballs, but we can't really say why, and we're not really sure that our conditions were rigorous enough to even come to that conclusion. We just didn't have the procedures in place to identify the cause of pressure variations this small."
Too much to hope for.