Saturday, March 30, 2013

Working with plots with large number of data points

This is important issue to deal with if there are very large number of data points as the data points will overlap and make unable to grasp the fact that how many data points are plotted at a particular data point.

Here idea is use small point size, type of point (such as "-" or ".") or use of transparent points.



Two Y series:

Recent version of Excel allow transparency in symbols.

You can overlay summary plot like density plot or histogram to show the density of data point. Here is the trick:

Here the trick is to overlay the points of frequency generated by histogram functions from data analysis package. The frequency are enforced to be negative to plot on negative side in Y axis, off course we can add to positive side too but will overlay with more scatter plot data points. The density is calculated by dividing the frequency with total number of observations. Then density is manipulated to a maximum plotting value suitable fit to axis, for example -15 in Y axis divided by maximum density and multiplied by the density.
 = (-15/ maximum density) * density


The density can be plotted as line or bar (histogram)


 With similar trick we can plot the frequency points in Y axis too. See here the X and Y data are reverse here while generating the scatter plot.


The final produce may look like the following:



We can also fit theoretical distribution to the plot.


Recent versions of Excel (eg. 2013) allow to set transparency to the point so that the point overlap is more clear.



XY plots (tricks and modifications)

XY scatter plot


XY Polt (scatter or line or combination) has both X and Y quantative values. The lines can be drawn, connected by smoothing based on data trend or directly connected as such.



We can just combine all of the different variations within single plot, just by selecting a series and change the chart type.  In the following case only Y2 is changed to smoothed line:


We can add error bars (specified from a series or fixed amount to the plot)


Connect the lines, in desirable connect.