
+1
Are q-q plots appropriate for paired data?
Rupert (Moderator (EN)) 8 years ago
in Geology
•
updated by fbilki (Moderator / Admin (AUS)) 8 years ago •
1
I have seen a few examples of people using q-q plots to display paired data such as duplicate sample analysis.These examples include posts on this forum, Consultancy technical reports and this month’s Micromine service and support newsletter.
q-q plots are great for investigating biases in unpaired data sets such as assays from different drilling campaigns or different laboratories. To my understanding, in many cases where you are lucky enough to have paired data such as duplicates or nearby samples (e.g. good twin holes), the data pairs are not independent and q-q plots are therefore inappropriate.
q-q plots “compare the distributions of two datasets (or a dataset and a theoretical distribution) by plotting their quantiles against each other”. The average grade for each quantile interval is calculated for each of the datasets independently and plotted against each other. This effectively un-pairs the data.
For an extreme example: imagine I have three data sets with the same distribution. one ascending, one random and one descending e.g. [1,2,3,4,5], [2,3,5,1,4], and [5,4,3,2,1]. If I plotted any one of these data sets against another on a q-q plot I will get a perfect positive correlation as each data set is treated independently. This seems unreasonable for duplicate analysis.
It appears that q-q plots are commonly being use to simplify scatter plots of paired data to investigate conditional biases. When this is the case I would have thought it more suitable to plot a conditional expectation plot. To me, this is very similar to a q-q plot except that the average grade of the samples in each quantile of the first set are plotted against the average grade of the equivalent samples in the second set. This preserves the pairing. A version of conditional expectation curves is described in Isaaks and Srivastava (although this plots ranges of X rather than the average grade, thereby producing a stepped curve).
I am a geologist, not a statistician, so please let me know if my logic is unsound. If people agree with this point of view I suggest that Micromine add conditional expectation plots as an option to the stats menu or modify the q-q plot form to deal with paired data.
In my experience the best way to display paired data is to use a scatter plot and/or a percentage half difference plot. Both of these types of graph can be summarised using the conditional expectation approach if there are too many data to display clearly.
A percentage half difference graph plots the mean of each pair on the X axis and half the difference between the pairs as a percentage of the pair mean on the Y axis. It would be great to see this type of plot added to the Micromine Stats menu.
q-q plots are great for investigating biases in unpaired data sets such as assays from different drilling campaigns or different laboratories. To my understanding, in many cases where you are lucky enough to have paired data such as duplicates or nearby samples (e.g. good twin holes), the data pairs are not independent and q-q plots are therefore inappropriate.
q-q plots “compare the distributions of two datasets (or a dataset and a theoretical distribution) by plotting their quantiles against each other”. The average grade for each quantile interval is calculated for each of the datasets independently and plotted against each other. This effectively un-pairs the data.
For an extreme example: imagine I have three data sets with the same distribution. one ascending, one random and one descending e.g. [1,2,3,4,5], [2,3,5,1,4], and [5,4,3,2,1]. If I plotted any one of these data sets against another on a q-q plot I will get a perfect positive correlation as each data set is treated independently. This seems unreasonable for duplicate analysis.
It appears that q-q plots are commonly being use to simplify scatter plots of paired data to investigate conditional biases. When this is the case I would have thought it more suitable to plot a conditional expectation plot. To me, this is very similar to a q-q plot except that the average grade of the samples in each quantile of the first set are plotted against the average grade of the equivalent samples in the second set. This preserves the pairing. A version of conditional expectation curves is described in Isaaks and Srivastava (although this plots ranges of X rather than the average grade, thereby producing a stepped curve).
I am a geologist, not a statistician, so please let me know if my logic is unsound. If people agree with this point of view I suggest that Micromine add conditional expectation plots as an option to the stats menu or modify the q-q plot form to deal with paired data.
In my experience the best way to display paired data is to use a scatter plot and/or a percentage half difference plot. Both of these types of graph can be summarised using the conditional expectation approach if there are too many data to display clearly.
A percentage half difference graph plots the mean of each pair on the X axis and half the difference between the pairs as a percentage of the pair mean on the Y axis. It would be great to see this type of plot added to the Micromine Stats menu.
Customer support service by UserEcho
I'm sure the purists would argue that a Q-Q plot is not valid when the two data sets are dependent. With that said, I think the pragmatic answer is to use whichever plot produces the clearest view of the phenomenon under investigation. It may well be that in Don's case the scatterplot was not able to highlight the bias as clearly as the Q-Q plot did.