Measurement and its ambiguities
Measuring blood pressure
Many elements may be involved in a sick person getting better, and it's not
always clear what they are. Consider an elaborate Australian study of the
treatment of moderate hypertension: tens of thousands of Australians
were screened for high blood pressure. Only those with systolic blood
pressure (SBP) greater than 200 or diastolic blood pressure (DBP) greater
than 90 millimeters of mercury (mm Hg) were entered in the study. Various
drugs or placebos were given to different sub-groups, and the blood
pressure in the participants generally dropped, as shown in measurements
made every four months. The study wisely included a relatively
small group of 237 untreated people with moderate hypertension - people
who received no medication and no placebos of any kind - whose blood
pressure was also measured every four months. Their blood pressure
dropped, too. The mean DBP in this group dropped from 101.5 mm Hg
(mildly elevated) to about 80 mm Hg (normal) in thirty-two months, and
then stabilized at that level (plus or minus 1 mm Hg) for the next two
years (MCATT 1982).
Why did the blood pressure of these people go down? For those treated
with various blood pressure reducing drugs, we might be tempted to
explain the decline as a result of the specific medical effectiveness of
these drugs. But patients who got placebos also showed blood pressure
declines. And, most interesting of all, patients who got no placebos and
no medications
also showed this decline. One explanation for the return of
blood pressure to normal in this group is "regression to the mean," which
we discussed briefly earlier. This principle states that, if you select a group
of people based on the fact that they share an extreme characteristic
(high blood pressure, for example), they will in time revert to a more
normal condition as the result of ordinary human homeostatic processes.
Similarly, one can show that tall men tend to have tall sons, but not sons
as tall as they are; and short men tend to have short sons, but not as
short as they are. Stature tends to "regress to the mean." Note that if this
were all that were going on, in some number of generations, there would
no longer be any tall, or short, men. That's clearly nonsense, since it is
also possible for men of average stature to have sons appreciably taller,
or shorter, than they are. And that may be analogous to

what happened
to the people in the blood pressure study.
There are, however, other explanations. It may be that these individuals
were, when first enrolled in the study, responding nervously to
having their blood pressure taken, which gave them higher blood pressure;
"white coat hypertension" is a well-recognized phenomenon (Landray
and Lip 1999; O'Brien 1999). Having their blood pressure taken every
four months may have gradually desensitized them to this event, and
their blood pressure then no longer increased with the approach of the
cuff. This would be a case of a distinct "measurement effect," where the
measurement created the object of study, the elevated blood pressure, at
least for a while.
There is another possibility: this may be an example of the meaning
response. There is ample evidence to indicate that the use of various
medical instruments and machines can have significant healing effects.
We will look more closely at this possibility later. For the moment, we can
imagine a modification to the study as I have described it which might let
us make a more informed judgment about what was going on here. Suppose
that the experimenters had included another group of people with
high blood pressure, but the people in this group did not have their blood
pressure taken every four months. Instead, their blood pressure was taken
only at the end of the study, after three years. It's unlikely that this group
would have gotten used to the blood pressure experience as the repeat measurement
group might have. So, if at the end of three years we found
that this ignored (no treatment at all!) group
also now had normal blood
pressure, we could probably attribute the change to "regression to the
mean." If that group still had high blood pressure, we could attribute
the change in the multiple measurement group to the placebo effect
of the blood pressure cuff. Unfortunately, the researchers didn't do this,
and so we will probably never know.
The point is that it is very difficult to know, even under the most stringent
conditions, and in the simplest and most clear cut cases, just why a particular group of people
"got better." Autonomous (or homeostatic) responses, drug responses,
and meaningful responses are much easier to keep separated conceptually
than they are in practice.
Diagnosis is treatment
Let's pursue this thought experiment - our addition to the Australian
blood pressure study - a little bit further. Suppose we were going to
do as I suggested and identified several hundred people with moderate
hypertension, and then did nothing to them for three years. What might
we tell them about this experiment? Unless we could measure their blood
pressure without them knowing it, we would have to tell them
something.
And, we have found that their blood pressure is higher than normal.
Suppose we tell a little lie; we say that they are fine, there is nothing to
worry about, nothing to be done, go home and don't think about it any
more. Earlier, I said that this was to be a group that got
no treatment
at all. But our little lie here doesn't seem to me to be "no treatment
at all." Consider another possibility: suppose we told these people the
"truth," that they had high blood pressure, that this was a risk factor
for stroke and heart attack, and that they had a medical condition for
which several thousand other people in this study would be treated with
powerful medications. But they wouldn't get any. Of course it's unlikely
that we would do either of these things; and indeed, the researchers didn't
do either. But it seems quite plausible to me that if we did one or the other
of these things, the outcome might have been quite different. The group
to which we told lies might have been quite comforted by our charade and,
as a result, their blood pressure might have gone down. The group with
which we were brutally frank, but not very caring, might be expected to
be scared and disturbed, and we might imagine that their blood pressure
could go up, or at least not go down as it might in the group to which we
lied.
What this means is that the very fact of diagnosing a person with some
sort of medical condition
is a form of medical treatment which can be expected
to have an effect. This process was noticed many years ago by
Howard Brody and David Waters and described in a fascinating article
titled "Diagnosis is Treatment." They describe several interesting cases
where it is quite clear that the shaping of the diagnosis can make a significant
difference in the outcome of the illness. In one case, a 52-year-old
man with long-term hypertension and recent symptoms of ulcers was
quite testy with his doctor when asked about changes in his family situation.
The doctor was trying to find areas of increased stress and anxiety;
the patient was unwilling to describe any. But, with some persistence,
the doctor learned that the man's wife had recently returned to work and
was enjoying her "new life"; the patient, however, explained that he was
feeling abandoned, and was very unhappy about it. The doctor suggested
he discuss this with his wife. "The physician asks if he had felt more tense
or sad since she returned to work. The man considers the idea and says
he could not say but would think about it. He returns two weeks later to
say he had discussed the conversation with his wife, who had not realized
how deserted he was feeling. He is feeling much closer to her and more
relaxed. He also reports a decrease in gastric pain." The authors continue
by saying that in this (and another) case, "the diagnosis
in itself exercised
a therapeutic effect for the patient inasmuch as it provided an understandable,
acceptable explanation for his behavior" (Brody and Waters
1980).
Untreated control groups
This leads us to another curious and complex issue. Occasionally, studies
are designed to have a "no-treatment group," sometimes called a "natural
history group," as did the Australian blood pressure study. The idea is
that, in this group which is getting no treatment, we can see the "natural
course of the disease." I would counter that, except under the most
extraordinary circumstances, it is logically and conceptually impossible
to have a no-treatment group. In order to do a trial, people have to be
recruited and diagnosed for the condition under study; they receive some
sort of examination, maybe an invasive and dramatic one. They give informed
consent, perhaps after reading a long and complex document
describing the study, the various treatments under review, and so on.
They are then randomly assigned to (in this case) three conditions: drug
treatment, placebo treatment, or no treatment. It's not clear what one will
tell the group getting "no treatment." Certainly, their participation can't
be "blind" to them; they know they aren't getting any drugs or placebos;
a reasonable inference might be that they are healthy enough not to need
any. And there has to be a follow up, an assessment of the condition of
the subjects after some period of time, or a diary of symptoms has to be
kept, or something similar. While these people have not had pills, they
have had a good deal more than "nothing."
The only way to proceed would be to diagnose illness surreptitiously,
secretly, so that the individuals didn't know they were being observed; the
follow-up would also have to be secret. No lab tests would be possible.
Think "medicine by binocular."
I know of only one experiment which approximates a genuine no treatment
group, the Tuskegee Syphilis Study, in which the US Public Health
Service enrolled 399 poor, rural, African-American men with syphilis
into a forty-year-long observational study (Jones and Tuskegee Institute
1981). The idea was, in part, to see what happened to people who had
syphilis which was not treated at all. Begun in 1932, the study went on
until 1972; these men were deprived of all treatment. There were moderately
effective treatments with salvarsan and other drugs in the 1930s.
Treatment with penicillin, which was available for treatment of syphilis
in the mid-1940s, was also denied to them. In 1997, President Clinton
apologized to the few survivors of the experiment and their families, and
to the nation, for this egregious ethical and moral catastrophe. It is, then,
possible to have an untreated group, but only if you are prepared to go to
incredibly extreme lengths.
Clinical trials
Given this complexity, how does one ever figure anything out about medical
treatment? One of the great benefits of science is that the essential
methodology is simply to forge ahead anyway, regardless of the complexities,
by simplifying (this, of course, is also one of the great problems of
science!). The main procedure that researchers use in clinical research
4
is the "randomized controlled trial" or "RCT." An RCT is designed to
determine the efficacy of a drug for people with a particular medical
condition. A simple study design would go like this.
Researchers first accumulate a number of people with a certain condition.
If they have lots of patients with this condition coming to their
own clinic, they might just ask their own patients if they were interested
in participating in the study. Or they might ask other physicians to refer
appropriate patients to them. They might advertise (residents of communities
with university medical centers are accustomed to seeing ads in
the local newspaper saying things like "Are you a man between 25 and
40 years of age and suffering hair loss [or psoriasis, or migraine headache,
or any of hundreds of other such conditions]? Call 1-888-234-5678
to participate in a study of ... "). Recently, large research organizations like
the various Institutes of the National Institutes of Health and others have
been seeking participants for studies over the World Wide Web. 5 Today,
many large trials are carried out simultaneously at a number of sites,
from two or three to a hundred or more. Whenever such studies have any
federal funding, and most other times, too, they have been approved by a
research committee - often called an Institutional Review Board (IRB) -
to see that they are ethically acceptable, that the rights of the patients
who volunteer for the trials are protected, and, particularly, to see that
all patients are appropriately asked for their "informed consent."
Patients are then matched up against the entrance requirements for
the study. This is a very important point in the process. Sometimes, it's
relatively simple: "patients with active ulceration of the duodenum as
seen on endoscopy" is pretty straightforward; "patients with significant
late-luteal phase dysphoric disorder (i. e., PMS)," or, remembering our
earlier discussion, "patients with mild to moderate hypertension" are
more problematic. Let's take the simpler case: ulcers.
After being selected for the study, patients are given some sort of treatment
for the condition. In some studies, there will be three or four treatment
groups with different amounts of medication (10 mg, 20 mg, 30 mg,
etc.). In the technical argot of medicine, these are sometimes called
"verum" groups; verum means "true" or "truly." And then there is also
a "control group." The control group may be given an existing standard
drug or it may be given an inert treatment, a "placebo." The central necessities
at this stage are that patients must be allocated to the different
treatment groups at
random, and that neither the researchers nor the patients
can know who is getting which treatment; hence they are called
"Randomized Controlled Trials."
The patients are treated with their medications or placebos for an appropriate
length of time - in the case of ulcers, it might be for about
a month - and then they are checked again to see what has happened;
they might receive a second endoscopic examination, for example. At this
point, the study is "unblinded," and the outcome in the treatment groups
is compared. For the sake of simplicity, let's assume that there were two
groups, one receiving active drug treatment and one receiving placebo
treatment. And suppose we find that, in the drug treatment group, 60%
of the patients are better, while in the placebo treatment group, 40% are
better. Sounds pretty good.
But suppose that we had only 10 people in each group. In the drug
group, 6 of 10 were better, while in the placebo group, 4 of 10 were
better. There are only 2 more in the drug group that got better than in the
placebo group. It seems pretty likely that this could have occurred simply
by chance; we might very well have had this outcome if we had given
placebos (or active drugs) to both groups. Indeed, this example is rather
like the outcome of an experiment of flipping coins. Flip a coin 10 times,
and you have an excellent chance of getting 6 heads one time, and 4
the next. If, however, you flip the same coin 1,000 times, it is extremely
unlikely that you will get 600 heads the first time round and 400 heads
the second. If you had a crooked coin, you might get heads 606 once and
594 once, but not 600 and then 400. So, sample size is important.
Suppose in our ulcer treatment study we had enrolled 2,000 patients,
and we had 60% healed in the drug group (600 of 1000), and 40% in
the placebo group (400 of 1,000). It seems extremely likely that, if we
repeated this experiment, we would
not get reversed results the second
time, just like the coin. It is quite clear that we can conclude now that the
drug is an effective one for healing ulcer disease. But we have had to do
an
awful lot of work to prove it, studying 2,000 patients. This is why the
use of statistics is essential in doing clinical research. Using fairly straightforward
statistics, one can determine what the probability is that the outcome
of a particular experiment is due to chance. For example, when
you flip a coin 10 times, you have a 37% probability of getting 6 heads;
such an outcome is likely 1 time in 3. No one is likely to conclude from
this experiment that we have a biased coin (or an effective drug).
If, however, we have 50 patients in each of two groups, one getting an
active drug and one getting a placebo, and 60% of the drug patients are
better (30 of 50) at the end of the trial, and 40% of the placebo patients
are better (20 of 50), there is only a 4.5% chance that this "biasing" of the
outcome is due to chance. In such a case, it is common to say that there
is less than one chance in 20 (5%) that the outcome is due to chance;
people also say that the result is "statistically significant at the .05 level."
Now notice that just because something is "statistically significant"
doesn't necessarily mean that it is particularly important (or "significant").
If we have a new drug that we test on 10,000 people (two groups
of 5,000 each), and people in the drug group are better at the end of the
trial 51% of the time (2550 of 5,000), and the placebo group patients
are better 49% of the time (2,450 of 5,000), this is a statistically significant
difference which is exactly the same as in the previous case (this
outcome could happen simply due to chance only 4.5% of the time). But
even though the difference is statistically significant, it doesn't seem very
significant (unless it were really really cheap to do!).
Another way to look at this is to use the concept of the Number Needed
to Treat (NNT). The NNT tells you how many people have to receive
some treatment in order for one person to benefit from it. To calculate
the NNT, you determine the proportion of benefit the treatment gives,
and divide it into 100. In our case with 50 patients in each group, where
60% of drug patients got better, and 40% of control patients got better,
the proportion of benefit is 60% - 40% which equals 20%, which we
divide into 100% giving an NNT of 5. We need to treat 5 patients with
the new drug in order for one to benefit. All 5 have to pay for the drug,
and all 5 have to tolerate its undesirable effects, and one will benefit. In
our case with 10,000 patients, the proportion of benefit is 51% - 49%,
or 2%, which divided into 100% gives an NNT of 50. We have to treat
50 people with this new drug to have one person benefit. Just because a
difference is "statistically significant" doesn't mean it is "significant" for
real medical practice.
Note that I have been assuming that the differences between the drug
group and the control group in these studies was due to the fact that one
group got the drug and the other didn't. Are there any other possibilities?
One of the biggest problems in doing RCTs is being certain that
the individuals were assigned to the different groups in a truly random
fashion. Suppose, for example, that the researchers decided to simplify,
and arranged for all the men to be in one group and all the women to be
in the other. This would hardly be a random distribution. Indeed, it is
common enough for researchers to restrict study subjects to one gender
or the other (it has traditionally been men) just so this couldn't arise. At
the end of the study, when the results are "unblinded," the researchers
compare the two groups on a variety of demographic measures, hoping
that there are no differences - in gender, age, illness severity, economic
status, etc. - between them. If the groups are the same on these measures,
it is taken as evidence that the two groups are "the same," and therefore
any differences between them are due to the presence or absence of the
drug being tested.
6
There is another factor to consider. It is often alleged that, for a variety
of reasons, RCTs aren't really "blind." In particular, it is said that people
can figure out who is taking the drug and who is taking the placebo by
noticing "side effects," or whatever. This may be the case. In so far as it is,
and in so far as doctors or researchers convince people who researchers
think to be taking the drug that they will do better than others, the results
of the trial will probably show a deflated "placebo effectiveness" rate and
an inflated "verum effectiveness" rate. Later, if the drug is approved for
use, practicing physicians, convinced by these biased studies that the drug
is highly effective, will convey that enthusiasm (or bias) to their patients
and may heal a lot of patients with the meaning response. There's nothing
wrong with this, of course, but it's likely that, if some skeptic comes
along later with a better research design and tests the drug again, it will
disappear from the pharmacies fairly quickly.
The very fact that conventional medicine relies so strongly on the
randomized controlled trial, often referred to as the "gold standard" of
medicine, rests on the fact that people get better when they take inert medications.
4 There is a longstanding distinction in conventional medicine between two types of research:
laboratory research is work on chemistry or biology which might involve testing
various substances on tissues grown in petrie dishes, or on animals, and perhaps on the
occasional "human guinea pig," or the like. Clinical research involves testing drugs or
procedures on human beings in hospitals, doctors' offices, or clinics.
6 There is a down side to this. If you pick your patients so they are all white men between 40
and 45 with "stage II illness," who each make between $37,000 and $40,000 per year in
middle management, and don't wear glasses, and then randomly assign them to different
groups, at the end you will be able to show that the two groups were "the same." But you
won't have any real idea of whether your new drug will be of any value for rich 20-year-old
black women, or 70-year-old nearsighted Hispanics. This is a very common problem in
medicine; in particular, most drugs have, over the years, not been tested in women or
children.
Excerpted and adapted from: Moerman, Daniel E. Meaning, Medicine and the 'Placebo Effect'.
West Nyack, NY, USA: Cambridge University Press, 2002. Chapter 3.
Click
above for a sample sample quiz question.