What is the importance of replication? Why should an experiment be repeatable by an independent investigator? Why do results vary even when you try to keep everything the same? What factors did you vary (different pennies? side of penny? height of dropper? time of day?) What factors seemed to cause the most variation? What could you do to make your results more repeatable? What variables that you considered did not contribute much to the variation? How many drops can a penny hold? Why don't we all agree? What does it mean that our results vary? Why should the results be repeatable?
In this lab, you found differences among (between) investigators and those differences were often large. Some people got results in the 50s, while others were usually in the 20s, for the number of water drops a penny could hold. You also found differences within replicates of the same experiment, when you did the same thing more than once though, often, those differences were smaller if you were good at using the same methods from one trial to the next. The variation there was often less than 5 but sometimes as much as 10 or more!
We do multiple trials (this process is called replication) to make sure that our results are consistent across trials. This helps to eliminate anything that is due to chance. While your results from one trial to the next might not be identical, you should be able to get an average and put error bars around it, such as 23 drops plus or minus 4 drops, when done 20 times. If our internal process is not repeatable, that's not good. It should be. Why? Because we assume that if we hold everything important the same from one trial to the next, the results should be the same. Why? Because we assume that the natural laws that govern the universe are consistent from one moment to another. If we drop a ball, it should fall down today and do exactly the same thing tomorrow. If you can only do 20 push-ups today, you shouldn't be able to do 100 tomorrow! If the laws of the universe are constantly changing, science wouldn't work. It would be like playing a sport where the rulebook is constantly being revised. Sure, you would get different results in outer space, but that's why we try to control for things as much as possible. Location matters. Some things will matter, and others won't. Time of day probably doesn't matter much, but who knows? Temperature or hardness (amount of dissolved minerals) of the water coming out of the tap might matter. What factor varies the most? Within one experimenter's results, it's probably height of the dropper and how consistently you can squeeze the bulb. With a mechanical pipette mounted to a stand at a pre-set height, we could do better. Between experimenters, the size of the opening on the dropper is likely a factor. How could we control for that? We would make sure different experimenters used the same brand of dropper. Wear on the surface of the penny is probably a small factor, but dropper is probably the most significant source of "error."
What was our "null hypothesis"? That every trial should produce the same result, plus or minus a small error factor that we can't control. That everyone should get similar results. Clearly, that didn't happen. Before we could definitively answer the question of how many drops of water a penny can hold, we would need to get better at being consistent, both within trials by one experimenter, and between experimenters. Could we answer the question exactly? Probably not. But we might be able to say that with a mechanical pipette that dispenses 1 cubic milliliter of water per drop, held 5 cm above the penny, at 7000 ft. elevation, on this day of the year at this time of day, in Flagstaff, Arizona, with distilled water and a 2010 penny, head side up, that the penny can hold an average of 27 drops, plus or minus 3 drops, and that the result is repeatable across 20 experimenters doing 25 trials each. I just made that answer up, but you get the idea. Ok, so once we have our "baseline data," we can start testing. Does the heads side hold more than the tails side? Does year of penny matter? Do the results vary at sea level? What if we find that a 1955 penny holds 23 drops, plus or minus 4, and a 1975 penny holds 24 drops, plus or minus 3, and a 2010 penny holds 29 drops, plus or minus 4? That would indicate that a 1955 penny and a 1975 penny are not significantly different from one another, but both are significantly different from a 2010 penny. Why? Who knows! But that's the next question to try to answer. Maybe it's because modern pennies contain more zinc and less copper? We could test that by researching the year that the makeup of the penny changed. Would we know for certain that it was the metal makeup of the penny that caused it to hold more drops? No, but that factor is "correlated" with the difference. If we find that pennies with more zinc hold more water than pennies with less zinc, that's interesting. Maybe that's the cause, or maybe something else changed at the same time? Perhaps the ridge around the edge of the penny is higher on the newer zinc pennies? So although the zinc is correlated, it might not be the cause of the change. Can you see how this dumb little experiment allows us to discuss a lot of details about how science works?