User:Jadrian Miles/Streamline clustering: Difference between revisions

Revision as of 00:10, 24 March 2009

Tubegen generates an easy-to-parse .nocr file specifying points on streamlines.

Pick a good dataset (Diffusion_MRI#Collaboration_Table) -- $G/data/diffusion/brown3t/cohen_hiv_study_registered.2007.02.07/patient120
Run tubegen on it with modified parameters so it doesn't cull anything---this will result in ~100k curves, with an average of ~70 points per curve.
Write a python script to divide the computation of the curve-to-curve distance matrix among many computers.
- Try max and mean minimum point-to-curve distance in overlapping region as inter-curve distance measure.
- The per-curve script should return the assigned matrix line as well as a list of curves sorted by distance and annotated by the distance, for fast clustering.
- After computing the upper half of the matrix, create an ordered list of curve-to-curve distances annotated with the curve pairs. Distributed w:quicksort? [1]
Build up clusters until some termination condition: satisfactory number of non-singleton clusters, satisfactory median size of non-singleton clusters, etc. Or just run until you get one huge cluster, but store the binary cluster tree. It may be really skewed but maybe a tree rebalancing algorithm could help in post-processing.
- Initialization: each curve is a singleton cluster.
- A curve's distance to a cluster is the minimum distance to any curve in that cluster.
- In each iteration, with lowest minimum curve-to-curve distance to its closest cluster.

@@ Line 1: / Line 1: @@
 [[Tubegen]] generates an easy-to-parse <tt>.nocr</tt> file specifying points on streamlines.
-# Pick a good dataset ([[Diffusion_MRI#Collaboration_Table]]).
+# Pick a good dataset ([[Diffusion_MRI#Collaboration_Table]]) -- <tt>$G/data/diffusion/brown3t/cohen_hiv_study_registered.2007.02.07/patient120</tt>
 # Run [[tubegen]] on it with modified parameters so it doesn't cull anything---this will result in ~100k curves, with an average of ~70 points per curve.
 # Write a python script to divide the computation of the curve-to-curve distance matrix among many computers.