![]() It is clear, isn’t it? But, actually, it is not. QQ-plot and PP-plot are just 2 different types of probability plots. While QQ-plot and PP-plot are well separated, the meaning behind the probability plot is sometimes obscure.Īccording to Wikipedia and most sources on the Internet, probability plot is a graphical technique, used to compare 2 datasets (each may be empirical or just theoretical), checking if they follow the same distribution or not. In this blog, I will differentiate these 3 definitions.įirst, I have to say, the confusion makes sense. There is confusion about the Probability plot, QQ-plot, and PP-plot. The confusion of Probability plot, QQ-plot, and PP-plot. In the next section, we will say more about this. QQ-plots are most often used for that purpose, but that’s not all that QQ-plots can do. But, that doesn’t matter much, we still see that data1 fits very well to a straight line, while data2 does not.Ī final note on this: Above, we showed how the QQ-plot can be used to test the normality of a dataset. The minor difference comes from the fact that we use a different estimation function (our estimation function is more simple). Let me show my code: def draw_QQ_plot(data, ax):īest_fit_line = np.polyfit(norm, data, 1)įig, ax = plt.subplots(2, 1, figsize=(8, 12))Īs you can see, our function gives plots that are very similar to the ones given by statsmodels. To simplify this step, we can, instead, do an actual sampling of size n from a normal distribution, and then take the i-th smallest number as an estimate of the expected value. In the function we used above from the statsmodel, y(s) are computed using an involved estimation. Let y = the value you expect when you sample n values from a normal distribution, sort them and take the i-th value (in other words, the i-th order statistics).Ī difficulty here is: how to find y(s)? That is, how to find the expected i-th smallest number of a sample normal distribution with size n? Then, for each i-th value in our data (i goes from 1 to n): Let’s assume we have a data of n samples (a list of size n). In fact, drawing a QQ-plot is quite simple. I don’t think we can really understand how the QQ-plot works and why using it can help us see the normality of our data without knowing how exactly can we draw it. The tickers on the x-axis and the y-axis can be safely ignored. The red lines on the plots are the best-fit regression lines of the points. # data2 is sampled from an uniform distributionĭata2 = random.uniform(1, 10, 1000) fig, ax = plt.subplots(2, 1, figsize=(8, 12))ĭata2, which does not follow the normal distribution, does not form a straight line. ('seaborn-darkgrid') # data1 is sampled from a normal distribution ![]() Import seaborn as sns # Define some characteristics of plots Then, I will draw 2 QQ-plots of these 2 datasets and you can compare the results. While data1 is generated from sampling a normal distribution, data2 is from the uniform distribution. On the other hand, if the points do not make it a straight line, the variable probably does not come from the normal distribution. If all points on the QQ-plot form (or almost form) a straight line, it is a high chance that the examining variable is normally distributed. High level speaking, QQ-plot (Quantile-Quantile plot) is a scatter plot, often be used to check if a variable follows the normal distribution (or any other distributions). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |