PlotAll – Distribution
Data distribution can be viewed in PlotAll by checking ‘Show Distribution’ box on the far right of the window. This will open a dropdown containing plotting options. Available plot options are histogram, frequency polygon, density, box plot, violin, jitter and raster. Following are the examples of each plot type.
Histograms are used to view the distribution of one continuous variable. This is done by dividing the x axis into bins and counting the number of observations in each bin. Let’s start downloading the following dataset named ‘returns.csv’ in your local drive. Right click the link below and select ‘download linked file’.
Now load data from returns.csv into the app. Click ‘Browse’ and load the file from your local drive.
Following is a preview of the data. It is the daily returns of stocks of Microsoft (MSFT), Amazon (AMZN), Apple (AAPL) and Google (GOOG) since December 2014. This is calculated from adjusted closing price accessed by Yahoo Finance. Returns were calculated using the following formula.
Return = (Stock Price Day 1 – Stock Price Day 2)/Stock Price Day1
Date | MSFT | AMZN | AAPL | GOOG |
12/10/14 | -0.014 | -0.021 | -0.019 | -0.014 |
12/11/14 | 0.006 | 0.005 | -0.003 | 0.004 |
12/12/14 | -0.005 | 0 | -0.017 | -0.018 |
12/15/14 | -0.006 | -0.004 | -0.014 | -0.009 |
12/16/14 | -0.032 | -0.036 | -0.014 | -0.036 |
12/17/14 | 0.013 | 0.013 | 0.025 | 0.019 |
12/18/14 | 0.039 | -0.004 | 0.03 | 0.012 |
12/19/14 | 0.003 | 0.007 | -0.008 | 0.01 |
12/22/14 | 0.007 | 0.022 | 0.01 | 0.017 |
12/23/14 | 0.01 | -0.001 | -0.004 | 0.011 |
12/24/14 | -0.006 | -0.011 | -0.005 | -0.003 |
12/26/14 | -0.005 | 0.02 | 0.018 | 0.01 |
12/29/14 | -0.009 | 0.01 | -0.001 | -0.007 |
12/30/14 | -0.009 | -0.006 | -0.012 | 0 |
12/31/14 | -0.012 | 0 | -0.019 | -0.008 |
1/2/15 | 0.007 | -0.006 | -0.01 | -0.003 |
1/5/15 | -0.009 | -0.021 | -0.028 | -0.021 |
To visualize all columns, we need to reshape the data. Check ‘Reshape Data’, choose ‘Date’ under ‘select fixed column’ and click ‘Submit’. This will transform data into 3 columns, first fixed column ‘date’, second ‘variable’ column contains all other column names MSFT, AMZN, AAPL and GOOG and the third is ‘value’ column contains all daily return values.
Under ‘Plot Variables’, select ‘Date’ in ‘x variable’, check ‘date’ and select appropriate ‘date format’. In this case date format is ‘month/date/year’. Select ‘value’ in ‘y variable’. Edit axis titles by checking ‘Titles’ and entering ‘Date’ in ‘title x axis’ and ‘Daily Returns’ in ‘title y axis’. Also, edit plot titles and caption here as you like. Click ‘Submit’. This shows all stock returns by date.
To separate them by color, select ‘variable’ in ‘color variable’ under ‘Variable Objects’. Check ‘Legends’ and Click ‘Submit’. Points are scattered around zero.
To view returns in individual plots, check ‘Create Subplots’, select ‘variable’ in ‘1st subplot variable’, check off ‘Legends’ and click ‘Submit’. This will create individual plots of daily returns separated by stocks.
Now, let’s plot the distribution of daily returns as histograms. Make sure the distribution column is a continuous variable and selected in ‘y variable’ dropdown. Check ‘Show Distribution’, select ‘Histogram’ in ‘plot type’, edit plot titles and click ‘Submit’.
Change the plot type to frequency polygon by selecting ‘Polygon’ under ‘plot type. Click ‘Submit’.
Another way to view this distribution is density plot. It calculates and plots kernel density estimates. It is the non-parametric method to estimate probability density function of a continuous random variable. Select ‘Density’ in ‘plot type’, enter titles as you like and click “Submit’.
Box plot or box and whisker plot is a way to visualize the distribution of data based on five values – minimum, first quartile, median, third quartile, and maximum. The size of the middle rectangle is made of first quartile to third quartile, also known as interquartile range (IQR). The line inside rectangle is the median and whiskers spread to data minimum and maximum. It estimates outliers as either 3 X IQR or more above the third quartile or 3 X IQR or more below the first quartile and shows as individual points. Select ‘boxplot’ in ‘plot type’, choose ‘variable’ in ‘x variable’, check off ‘discrete’ and click ‘Submit’. ‘value’ is the ‘y variable’ in this case.
Median daily returns of the stocks are close to zero and the spreads are very similar.
A violin plot is a combination of box plot and density plot. It is a mirrored density plot displayed similar to boxplot. Select ‘Violin’ under ‘plot type’ and click ‘Submit’.
Select ‘Jitter’ under ‘plot type’ to display individual points, a useful way to deal with overlapping.
Finally, we use ‘Raster’ to identify the concentrated areas in a dataset. Select ‘Raster’ under ‘plot type’. Choose ‘Date’ in ‘x variable, check ‘discrete’, identify date formate. Select ‘value’ in ‘y variable’. Check ‘Legends’. Click ‘Submit’.
Now try your own. Plot data distribution with PlotAll.