Statistics And Graphs: Difference between revisions

From VrlWiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Statistical Analysis and Producing Graphs =
Many good resources exist to help you analyze your data.  Here are a few:
==Books==
* "Discovering Statistics Using SPSS", available at [https://josiah.brown.edu/search/?searchtype=X&searcharg=discovering+statistics+with+spss&searchscope=07&SORT=D&SUBMIT=Search Brown libraries]. '''NOTE:''' The 2013 edition has been rebranded [http://www.amazon.com/Discovering-Statistics-using-IBM-SPSS/dp/1446249182 "Discovering Statistics with IBM SPSS Statistics"].
* [http://www-stat.stanford.edu/~tibs/ElemStatLearn/ Elements of Statistical Learning] (downloadable pdf), recommended by Rossi Luo and Erik Sudderth
* [http://www.amazon.com/Machine-Learning-Probabilistic-Perspective-Computation/dp/0262018020 Machine Learning: A Probabilistic Perspective], recommended by Erik Sudderth
* [http://www.amazon.com/The-Essence-Multivariate-Thinking-Applications/dp/0805837302/ref=sr_1_fkmr1_1?ie=UTF8&qid=1377805278&sr=8-1-fkmr1&keywords=he+Essence+of+Multivariate+Thinking Essence of Multivariate Thinking], recommended by Stephen Correia
* [http://www.amazon.com/Using-Multivariate-Statistics-Barbara-Tabachnick/dp/0205849571/ref=sr_1_1?s=books&ie=UTF8&qid=1377274046&sr=1-1&keywords=tabachnik+and+fidell Using Multivariate Statistics], recommended by Stephen Correia
* [http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738 Pattern Recognition and Machine Learning], recommended by Ryan
* [http://www.math.dartmouth.edu/~prob/prob/prob.pdf Grinstead and Snell's Introduction to Probability] (downloadable pdf), recommended by Ryan


This page demonstrates through some simple examples how to use statistical tools to process your observations and create graphs for papers or presentations.
==Papers==
* [http://dl.acm.org/citation.cfm?id=1553488 Supervised learning from multiple experts: whom to trust when everyone lies a bit], recommended by Rossi Luo
* [http://statistics.berkeley.edu/sites/default/files/tech-reports/790.pdf Measuring Reproducibility of High-throughput Experiments], recommended by Rossi Luo


Initially, this tutorial will use SAS and gnuplot to create graphs.  We should expand it to demonstrate how to use Matlab and SPSS which are also popular and powerful tools for doing statistics.
==Practicals==
 
* [http://depts.washington.edu/aimgroup/proj/ps4hci/ Practical Statistics for HCI], independent study modules by Jacob Wobbrock at UW
== Examples ==
* [[Statistics Tutorial]] written by Andrew Forsberg
 
'''N.B., WE NEED SOMEONE TO TEST THIS TUTORIAL-- IT IS UNTESTED.  IF IT WORKS, REMOVE THIS DISCLAIMER.'''
 
Example 1. You are interested in two visualization methods and want to compare the speed a task is completed with both.  First, setup your experiment and collect your samples which might be of this form:
 
<pre>
technique    time
1            12
2            7
1            13
2            9
1            20
2            10
1            16
2            12
1            13
2            15
1            14
2            6
</pre>
 
Here on, we assume this data is stored in a text file named "data.txt".
 
 
The SAS script below can process the data and produce new files with statistical information and data suited to plotting with gnuplut.  You could cut-and-paste the text into a file named "analyze_time_data.sas" to try it out.
 
<pre>
*options pageno=1 formdlim='-';
options nonumber nocenter  nodate pagesize=8000;
 
data result;
infile 'data.txt';
input technique time;
cards;
*proc print;
run;
 
/* sort */
title 'result: just sort it';
proc sort data = result;
by technique;
*proc print;
run;
 
/* summary statistics */
title 'result: GLM test on time/tech';
proc glm data = result;
class technique;
model time = technique;
means technique / tukey;
run;
 
title 'result: mean summary statistics technique (time)';
proc means mean std data=result;
by technique;
var time;
output out=result_means MEAN=Mean STDDEV=StdDev;
proc export data=result_means outfile='graph_data.gp' replace;
proc print;
run;
</pre>
 
A bar graph with error bars showing standard deviation can be produced from the file "graph_data.gp" using the scipt below.  FUTURE WORK-- MAKE A CHANGE SO WE GET 95% CONFIDENCE INTERVALS WHICH IS MORE TYPICAL THAN STANDARD DEVIATION.
 
<pre>
set xlabel "Put X-Axis Your Label Here"
set ylabel "Time"
set xrange [0.5:2.5]
set yrange [0:60]
set nokey
set noxtic
set boxwidth 1.5 relative
set xtics ("Tech-1" 0, "Tech-2" 1)
set terminal postscript eps 30
set output "graph.ps"
plot "graph_data.gp" using 1:($4):($4-$5):($4+$5) with errorbars
</pre>
 
 
Example 2. You surveyed a population and want to run statistical analysis on the data collected and produce graphs showing the result.
 
[FILL IN...]

Latest revision as of 19:48, 29 August 2013

Many good resources exist to help you analyze your data. Here are a few:

Books

Papers

Practicals