Ei Wakamatsu
Intro
studying co-stimulatory (costim) molecules for t-cells also looking at foxb3 which expresses GFP sort the cells into positive or negative (expression?), analyze gene patterns 5 conditions, 2 time steps, 2 cell populations - lots of groups to compare! costim vs. cd3
Analysis
ei often refers to previously-generated graphs from a powerpoint presentation on his laptop - sometimes to help explain something to me and sometimes as a guide. i think he's trying to recreate the analysis that led to that paper / presentation... so he is more direct and less exploratory than jaime
- refers to a kind of matrix on the laptop first need to check data quality using express matrix has about 30 datasets he's comparing - uploads data to genepattern, launches express matrix selects a subset of data to compare significant loading time - EM brings up a scatterplot matrix with correlation values (a la iPCA) bad data (?) shows an s-shaped pattern in the scatterplot - looks for any sign of the s-shaped pattern - clicks on several individual scatterplots to see in more detail shapes don't look bad, so i can progress to the next step if you do find an s-shape, you need to remove the bad data and normalize again - refers to his laptop again there are two time steps in the study; want to see whether these time steps are shared or not shared means that the same genes behave similarly across the two time points - looks at scatterplots in multiplot - determines that genes are not shared between the time conditions - selects some genes and looks at them in the sidebar table
so that's how analysis with genepattern works, but recently has been using s+ and r instead
- opens S+, looks at a data table to point out his different conditions much faster than genepattern - brings up a command line window - types in commands a bit tentatively - thinking about it - generates a filtered dataset is really pleased about how fast it is to remove some unwanted data - typing more commands removing genes where max value is below 100
why? to remove noise or non-reproducable effects (seems like same thing jaime was concerned with, re. variance)
- refers to laptop again to extract genes from several conditions - looking at a scatterplot on his laptop cd3 vs. something?? focus on "active" regions (center top and center bottom) compare just one gene with genepattern is easy... but lots of conditions takes too much time (somewhere hadley wickham is feeling really vindicated and doesn't know why) - looking at some saved S+ code - copies code into the command window execution is pretty slow - generates a smaller table - checks the code again
? now he has two datasets and doesn't know which is the right one seems to be looking for the line of code that named the new data
- finds the table he's looking for looking at number of shared genes between each pair of conditions; given time steps, cell type, etc. which conditions are similar and which have no similarity
after this, i want to determine each co-stimulatory molecule's signature
quite enthusiastic about S+ & R - less visual than GP, but much faster ed: certainly the fact that gp makes all comparisons serial would make his particular questions pretty unbearable to answer! but would this be helped with a multiview system? hard to say. it's also too many comparisons for useful multiviewing...