Statistics

Statistics

Populations & samples

Before proceeding with a discussion of basic statistics, two terms must be defined. A population consists of a complete set (either finite or infinite) of measurements of the variable of interest. For example, measurements of the heights of all the students in a class, or their grades, would constitute a population of measurements. In contrast, a sample consists of a set of random measurements representative of the population. Each observation must have an equal chance of being chosen. Heights of randomly select people in a busy airport terminal would constitute a sample of the population. The number of X-rays reaching a detector in a second similarly would represent a sample, because not every X-ray produced from the material is counted. The distinction between sample and population vanishes as the number of measurements grows large.

Back to top.

Accuracy & precision

Accuracy is defined as how close a measured value is to the “true” value, consequently, it has an absolute value. It is very difficult to establish “true” values — determining the accuracy of a measurement requires calibration of the analytical method with known standard material. Accuracy is affected if the compositions of the reference standards are not well known. The geostandards community constantly works to provide well-characterized reference materials. In addition to having well known compositions, microprobe standards must also be homogeneous, having the same composition throughout. There are a large number of minerals, glasses, and metal available as microprobe standards.

Instrumentation_Accuracy_Precision

Image source: http://elchem.kaist.ac.kr/vt/chem-ed/data/acc-prec.htm.

Precision refers to how well a given measurement or result can be reproduced. Usually this takes the form of a standard deviation around a mean value. Values can be very precisely determined and still be very inaccurate; conversely, imprecise analyses may average to very accurate values (albeit with a large standard deviation). Precision effectively is limited by counting statistics when dealing with X-ray analysis as described below.

Many factors introduce errors to affect the accuracy and precision of analyses obtained by electron microprobe analysis. It is useful to subdivide these errors into two categories: systematic and random. Systematic errors introduce a constant bias into the results. Unlike random errors, which can be reduced by repeated measurements, systematic errors cannot be detected by statistical means. Their general effect is to shift the measured quantity away from the “true” value. Factors that can affect probe accuracy and precision include the following (S = systematic error, A = accuracy, R = random error, P = precision, → = affects):

  • The random nature of X-ray generation and emission (R → P)
  • Long-term Instrumental drift (S → A, P)
  • Short-term filament instability (R → P)
  • Specimen surface irregularities (R → P)
  • Focusing inconsistency (R or S → A, P)
  • Interaction volume intersecting two phases or secondary fluorescence from phases below surface (R → P)
  • Sample damage such as sodium- or volatile-loss (S → A, P)
  • Incorrect standard composition values (S → A)
  • Errors in matrix factors (S → A)
  • Incorrect system parameters such as take-off angle (S → A)
  • Sample tilt (S → A)
  • Variations in C-coat thickness relative to standard materials (S → A)
  • Sample charging (R → P)
  • Incorrectly located background measurements (S → A)
  • Peak wavelength shifts (S → A)

As can be seen in general systematic error affect accuracy, whereas random errors affect precision.

Back to top.

Basic statistics

Central Tendency

When successive measurements of the same quantity are repeated, a distribution of values is obtained. There are several measures of “central tendency,” including the median, mode and mean. We will use the mean, which for a sample is:

Stats_Mean

where xi = value of the ith easement and n = total number of measurements. The formula here uses “sigma” notation to indicate that a series of numbers are added together:

Stats_Sigma_Notation

Note that the symbol μ is used to denote the mean of a population.
Back to top.

Variability

We also will wish to characterize populations of measurements in terms of their variability. Measures of variability include: range, standard deviation, the coefficient of variation, variance, and standard error. The most useful for our work is the standard deviation, which for a sample is:

Stats_Sample_Deviation

Note that because we are dealing with a sample of the total population (the usual situation), the weighting factor is (n-1). If we had data for the entire population, this factor would be n. For a population, the appropriate equation is:

Stats_Population_Deviation

Although not technically correct, in the discussion below we shall use the symbol σ to represent the standard deviation.

Results are usually reported as mean ± standard deviation. In other words we are saying that, assuming a Gauss ian distribution of values, there is a probability of 68% that the measured value is within the range defined by the standard deviation. The standard deviation is related to the other descriptions of variation. The variance is σ2 and the coefficient of variation, ε, is:

Stats_Coeff_Variation

The coefficient of variation is also called the relative error or relative standard deviation. This is often expressed as a percentage.

Back to top.

Poisson & Gaussian distributions

The Poisson distribution applies to a wide range of phenomena in the sciences. It describes the probabilities inherent when an event occurs with a constant probability per unit time. Geological examples include: radioactivity (number of unstable nuclei that decay within a given period of time) and X-ray production (number of X-rays counted in a given period of time).

In 1831, Simon Denis Poisson (1781-1840) determined the expected distribution of the number detected in various statistically independent time intervals. This distribution assumes that:

  1. the events occur independently (their occurrence does not depend upon previous of following events),
  2. the probability of an event occurring is proportional to the length of time since the last event, and
  3. two events cannot occur simultaneously.

The resulting probability, P, of a events during a time interval is:

Stats_Poisson_Eqn

where, m = average number of random occurrences per interval. As the average number of random occurrences per interval, m, increases, the probability distribution moves to the right and broadens. For each m value, sum of all probabilities = 1.0. It should be emphasized that the Poisson distribution is not a continuous function, but rather gives the probabilities for discrete values of a.

Stats_Poisson_Distribution

Poisson distribution. Notice how the distribution for m = 10 resembles a “bell-shaped-curve.” The connecting lines are for convenience, probabilities only apply at the integer intervals shown by symbols.

For larger values of m, usually 30 or more, the Poisson distribution approaches the normal or Gaussian distribution.

Stats_Poisson_Distribution10

Probability distributions. The Poisson distribution for m = 10 is almost identical to the Gaussian distribution. Note that the curve width at half-maximum height is the square-root of m.

For populations of measurements that have a Gaussian distribution, we expect that about 68.3% of the data will be within 1 standard deviation of the mean (i.e., in the range Xavg ± σ). A more complete description of the distribution of data in a Gaussian distribution is given in the table.

Details of the Gaussian distriubtion

Thus we should expect that 95% of the data would be within 1.96 standard deviations of the mean (i.e., in the range Xavg ± 1.96 σ). This is called a 95% confidence interval for the sample.

Stats_Gaussian_Distribution

 

Gaussian distribution. Image from Figure 3.13 in Till, 1974, Statistical Methods for the Earth Scientist: an introduction.

Back to top.

Propagation of errors

Methodology

It is occasionally necessary to propagate errors to determine the total error that results from performing mathematical operations using several numbers that have associated errors. The discussion below is purely practical; a very complete discussion of error propagation is located at: http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc55.htm. At high count rates, X-ray production follows a Gaussian distribution. The combination of two Gaussian distributions with standard deviations of σ1 and σ2 , is another Gaussian distribution with a standard deviation, σ.

Stats_SD

This extremely useful property allows us to “propagate” the errors on individual measurements to determine a total error. The appropriate formulae are given in the table.

Error propagation equations

Back to top.

Example

It is often necessary to combine error expressions to calculate the total error. Consider the following expression with the errors given as 1σ values:

Stats_Error1

We can ignore the errors and determine the result:

Stats_Error2

Now let’s find the error on this value. First combine the errors in the addition portion of the expression:

Stats_Error3

Next we must combine this error on the numerator with the error associated with denominator. We must combine coefficients of variation because multiplication is involved:

Stats_Error5

This is the coefficient of variation of the result (948); we must convert it back into a standard deviation:

Stats_Error7

So the final answer is 948 ± 258.5 (1σ). Note that although the absolute value of the error of the numerator is larger (8.6 vs. 3), most of the error on the result (>90 %) comes from the large error on the denominator (25 %).

Back to top.

Statistics of x-ray counting

Error on count ratio

The production of X-rays is a Poisson process and can be analyzed using statistical methods. As noted above, at sufficient high values of m (count rates), the Poisson distribution is identical to the normal distribution and

Stats_Eqn1

where, σtheory = theoretical population standard deviation and C = total number of counts. In quantitative analysis, one determines the total number of counts on the peak of interest and corrects them for the background of the continuum. Corrected peak counts on the unknown are divided by corrected counts on a known standard material to determine the “K” ratio, which is used as input for the data reduction routine. The combined uncertainty on the K ratio due solely to X-ray counting statistics is:

Stats_Eqn2

where Rb = background rate (cps), Rp = peak rate (cps), Tb = background counting time, and Tp = peak counting time.

Consider a case where the standard peak is 100 cps, the backgrounds on standard and unknown are 2 cps, the unknown peak is 10 cps and the count times are the same (Tb = Tp = 15 sec). The error on the ratio is 11.5%. If we count twice as long, (Tb = Tp = 30 sec), ε decreases to 8.2%. Most of the error comes from the relatively low number of counts on the unknown; if they were doubled (Rb = 50 cps), ε would only be 3.3%. Note also that decreasing Tb for the standard to 1 second (assuming all other initial assumed values above) would only increase the total error on the K ratio to 11.6%. This is because at high values of Rp, the background becomes relatively unimportant.

Back to top.

Optimum counting times

To minimize the uncertainties due to the counting statistics, the times spent measuring the peak and the associated background should satisfy following relationship:

Stats_Eqn3

Back to top.

Detection limit

If peak counts are much greater than background counts, the background measurement hardly matters. By choosing standard materials with high concentrations of the element of interest, it is not necessary to count its background very long. However, if peak count rates are similar to background count rates (as is the case with trace elements), it is essential to spend as much time establishing the background rate as counting on the peak.

Derivation

When considering trace elements, we need to determine the “detection limit.” In essence, we are confronted with the signal to noise problem we’ve encountered before. Detection limits are a function of the background counts in unknown (controlling what is considered as detected) and the peak counts in the standard (allowing conversion of counts into concentration). Recall that the corrected peak counts, Ccorr, and standard deviations associated with it are:

Stats_DL_Eqn2

Using these relationships, the combined standard deviation associated with Ccorr is:

Stats_DL_Eqn3

For detection, the corrected X-ray peak height must be greater than error associated with it by some factor, z:

Stats_DL_Eqn4

If the population standard deviation σ is known, confidence limits about a single result may be calculated. The coefficient, z (more strictly, tp,∞), is the limiting value of the t-distribution function for ν = ∞ confidence level 1-α.

Confidence Intervals

For example, for a 99% confidence interval (α = 0.01), the value of z = 2.326. The user should select a confidence interval that meets the situation. In electron microprobe analysis, the 99% confidence is more than sufficient. Near the detection limit, Cp ~ Cb, so we may specify the detection limit (D.L.) as

Stats_DL_Eqn5

Recognizing that counting for a longer time improves our counting statistics, we may write:

Stats_DL_Eqn6

So far, we have defined the detection limit in terms of counts (or time and count rate), but we are interested in absolute concentrations. At such small concentrations, it is not necessary to make matrix corrections, so we may use a simple conversion factor, K, between counts per second and concentration. The factor K has units of counts per time per concentration (cps/wt. % element or cps/wt. % oxide). However, we are working with absolute numbers of counts so the K value (measured on the standard) must in turn be converted into counts per wt. % by multiplying by time. Rewriting the detection limit expression to include K yields:

Stats_DL_Eqn7

All values except time, Tb, are constants (with z selected by the analyst). The detection limit decreases as the square-root of time; thus, in order to halve it, Tb must be increased by a factor of 4!

Back to top.

Example

As an example, consider the detection of Ni in olivine. The peak count rate on a standard that contains 30 wt. % Ni is 1460 cps, The background rate on the olivine is 3 cps. The value of K is 48.67 cps/wt. % Ni. If we assume a count time of 30 seconds and a confidence level of 95% (z = 1.645), the detection limit (in wt. % Ni) is

Stats_Ni_DL

To convert this number into parts per million (ppm), we simply recall that 100 wt. % = 1,000,000 ppm; thus, 1 wt. % = 10,000 ppm . Converting, 0.015 wt. % Ni = 150 ppm Ni.

Back to top.

Caveats

Measuring trace elements requires careful determination of the background count rate on the unknown. It is usually wisest to slowly scan the spectrometer over the range of interest to characterize the shape and slope of the continuum. analysts should be wary of non-linear backgrounds, peak interference(s) at the background locations, and for the presence of absorption edges, which put a step in the background.

Stats_Background_Complications

Background complications. Correct determination of the background when measuring the concentrations of trace elements is critical. Usually it is assumed that the true background can be determined by measuring the background away from the peak and interpolating (case a). However, potential complication arise if there are interferences from high order peaks (case b), nearby absorption edges (case c), or non-linear backgrounds (case d).

One particularly good way to determine the background count rate is to use a standard that does not contain any of the element of interest and actually count X-rays at the peak position. Detection limits may be improved by increasing counting times and beam current to yield higher total counts. In addition, changing the accelerating voltage to optimize the overvoltage for the line of interest will improve detection limits.

Back to top.

Sample homogeneity

The precision due to counting statistics is the same for one long count as for several shorter ones done for the same total time. Collecting a large number of shorter counts is useful for determining whether counting statistics are the limit to attainable precision. If the effect of counting statistics is the principal cause of X-ray count variation for a homogeneous sample, then comparisons of observed and calculated parameters can be used to evaluate sample homogeneity. We consider analyses of a number of different grains (or different points on one grain), and ask the question “Are the observed variations the result of counting statistics or do they demonstrate true compositional variations?” To answer this, we will define the “homogeneity index,” H.I., as the ratio of the two standard deviations (in microprobe analysis this ratio is sometimes called the “sigma ratio”):

Stats_Homogeneity_Index

We can use a statistical procedure called the F-test, which assumes that all data were sampled on a random basis and are normally distributed, to test for homogeneity. We will use the F-test to evaluate the hypothesis that there is no difference between the two variances, σ12 and σ22 (as is the case for homogeneity); however, because we are not stating which variance in larger, we must use a “two-tailed” test and double the probabilities below (5% becomes 10%, etc.). Because F-values are defined as greater than one, we must place the larger value in the numerator.

The F ratio compares variance rather than standard deviations, so we square the homogeneity index to get the F ratio. Recall that in a Gaussian distribution, σtheory is the square root of the counts; thus

Stats_F-test

If F = 1, the observed variance is same as the variation expected just from X-ray counting statistics and we can argue the analyses are of a homogeneous sample. However, the F test allows us to assign probabilities to the likelihood of homogeneity. Tables of F-ratios and probabilities of equal variance (homogeneity) are available in many sources, but it is simplest to use the F.DIST.RT function available in EXCEL to determine the probability of homogeneity. This function requires the parameters of:

  • F-ratio (homogeneity index, H.I.)
  • number of degrees of freedom for the numerator of the F-ratio (n – 1), and
  • number of degrees of freedom of the denominator (taken to be infinite; enter “1e10”).

As an example, consider a set of three X-ray counts on a standard (degrees of freedom = 2), with a sigma ratio (H.I.) of 1.1. The F.DIST.RT function tells us that there is a 33.3% probability of homogeneity. This is a “single-tailed” value and must be doubled to give the total probability (66.7%) Other example values are given below.

Conversely, we can calculate the critical F value (Fcrit) for a given confidence interval (alpha) and number of analyses (n). In this case we utilize the EXCEL function F.INV.RT to determine the maximum F value (H.I.) permissible to assume homogeneity in the analyses. The function requires:

  • confidence level (alpha, where 0.01 indicates 99%, 0.05 indicates 95%, etc.)
  • degrees of freedom for the numerator of the F-ratio (n – 1), and
  • number of degrees of freedom of the denominator (taken to be infinite; enter “1E10”)

For example, at the 95% confidence interval, with n = 5, the H.I. should be less than 2.372. The value of Fcrit for a given confidence level, decreases with larger numbers of analyses, reflecting the fact that standard deviation should decrease also as shown in the figure.

Values of Fcrit as a function of number of analyses for the 95% confidence level.

Rigorous statistical tests of homogeneity have not been applied much in the geological literature, but H.I. is sometimes reported. A value of for H.I. greater than 3 has been cited as proof of sample inhomogeneity. However, as shown, if counting statistics were the only cause of variation, then much smaller values of H.I. effectively would indicate heterogeneity. Since counting statistics are only one source of X-ray count variability, the F test must be applied with care. Often, correlated variations (e.g., Ca decreases when Na increases) can reveal inhomogeneity.

Back to top.

Significant digits and rounding

Results should only be reported to the proper number of significant digits, because the number of significant digits and associated error are indications of the precision of the analytical results. Correct handling of significant digits (and error) and retention of the available precision requires an understanding of the propagation of significance in calculations.

Generally, if not specified, the precision may be assumed to be ±1 in the last reported digit, which is termed the least significant digit. However, some values effectively have an infinite number of significant digits. For instance, 1 inch is defined as exactly 2.54 centimeters (2.54 with an infinite number of zeros following) and each value is infinitely precise for purposes of conversion. In addition, for practical purposes, many constants (speed of light, Planck’s constant, etc.) are comparably precise and do not limit the precision of the results of calculations involving them.

The number of significant figures is defined as the quantity of digits in the number excluding leading or trailing zeros. For example, 3.142 has 4 significant figures; 23,459,000 has 5 significant figures; 0.31910 has 4 significant figures (the last zero does not count); and 0.0004086 has 4 significant figures (the zero between 4 and 8 is not a leading or trailing zero and so is counted). Trailing zeros are a main source of confusion, but use of scientific notation allows the writer to indicate the precision by only showing significant figures. Consider the number 2000 (which when written this way has 1 significant digit). The best way to indicate the number of significant digits is to use scientific notation:

2 x 103, 1 significant digit
2.0 x 103, 2 significant digits
2.00 x 103, 3 significant digits
One should retain all digits when performing calculations and, only when finished, round the result to the appropriate number of significant digits. For addition and subtraction, the result should have the same number of significant digits as the least precise number in the calculation. For example:

14.72 + 1.4331 + 0.00235 = 16.16.
In contrast, theoretically the only way to determine the correct number of significant digits for the results of calculations involving multiplication and division is to propagate significance as one would propagate error. Thus, the precision of the result cannot be better than the square root of the sum of the squares of the relative errors. For example, a measurement of 52.3 has an implied error of ±0.1, corresponding to a relative error of 0.0019. If we wished to square this value, the relative error of the result is

e = 0.00268.
Now, 52.32 = 2735.29, so the relative error corresponds to an absolute error of 2735.29 x 0.00268 = 7.3. The limit on precision is thus 7.3 (rounding to 1 in the tens place), and the result should be presented as 2.74 x 101. In practice this procedure is cumbersome, and usually, unnecessary. Note that the result has the same number of significant digits as the two numbers, which were multiplied.

Generally, one can simply follow the rule of rounding the result to the same number of significant figures as the least precise quantity used in the calculation. As examples, the limiting factors for some common calculations are given below:

Finally, when it is necessary to reduce the number of digits in a result this should be accomplished by rounding. If the number after the last significant digit is greater than 5, one should round the final digit up; if less, round down. If the digit is exactly 5, round up if the digit preceding it is odd (and down if it is even) to average out the effects of rounding.

Back to top.