Statistics Tutorial

From VrlWiki
Jump to navigation Jump to search

N.B., WE NEED SOMEONE TO TEST THIS TUTORIAL-- IT IS UNTESTED. IF IT WORKS, REMOVE THIS DISCLAIMER.

This section demonstrates through some simple examples how to use statistical tools to process your observations and create graphs for papers or presentations. Initially, this tutorial will use SAS and gnuplot to create graphs. We should expand it to demonstrate how to use Matlab and SPSS which are also popular and powerful tools for doing statistics.

Example 1. You are interested in two visualization methods and want to compare the speed a task is completed with both. First, setup your experiment and collect your samples which might be of this form:

technique    time
1            12
2            7
1            13
2            9
1            20
2            10
1            16
2            12
1            13
2            15
1            14
2            6

Here on, we assume this data is stored in a text file named "data.txt".


The SAS script below can process the data and produce new files with statistical information and data suited to plotting with gnuplut. You could cut-and-paste the text into a file named "analyze_time_data.sas" to try it out.

*options pageno=1 formdlim='-';
options nonumber nocenter  nodate pagesize=8000;

data result;
infile 'data.txt';
input technique time;
cards;
*proc print;
run;

/* sort */
title 'result: just sort it';
proc sort data = result;
by technique;
*proc print;
run;

/* summary statistics */
title 'result: GLM test on time/tech';
proc glm data = result;
class technique;
model time = technique;
means technique / tukey;
run;

title 'result: mean summary statistics technique (time)';
proc means mean std data=result;
by technique;
var time;
output out=result_means MEAN=Mean STDDEV=StdDev;
proc export data=result_means outfile='graph_data.gp' replace;
proc print;
run;

A bar graph with error bars showing standard deviation can be produced from the file "graph_data.gp" using the scipt below. FUTURE WORK-- MAKE A CHANGE SO WE GET 95% CONFIDENCE INTERVALS WHICH IS MORE TYPICAL THAN STANDARD DEVIATION.

set xlabel "Put X-Axis Your Label Here"
set ylabel "Time"
set xrange [0.5:2.5]
set yrange [0:60]
set nokey
set noxtic
set boxwidth 1.5 relative
set xtics ("Tech-1" 0, "Tech-2" 1)
set terminal postscript eps 30
set output "graph.ps"
plot "graph_data.gp" using 1:($4):($4-$5):($4+$5) with errorbars


Example 2. You surveyed a population and want to run statistical analysis on the data collected and produce graphs showing the result.

[FILL IN...]