User:Jadrian Miles/Streamline clustering: Difference between revisions

From VrlWiki
Jump to navigation Jump to search
Jadrian Miles (talk | contribs)
New page: Tubegen generates an easy-to-parse <tt>.nocr</tt> file specifying points on streamlines. # Pick a good dataset (Diffusion_MRI#Collaboration_Table). # Run tubegen on it with mo...
 
Jadrian Miles (talk | contribs)
No edit summary
Line 6: Line 6:
#* Try max and mean minimum point-to-curve distance in overlapping region as inter-curve distance measure.
#* Try max and mean minimum point-to-curve distance in overlapping region as inter-curve distance measure.
#* The per-curve script should return the assigned matrix line as well as a list of curves sorted by distance and annotated by the distance, for fast clustering.
#* The per-curve script should return the assigned matrix line as well as a list of curves sorted by distance and annotated by the distance, for fast clustering.
#* After computing the upper half of the matrix, fill in the lower half in a distributed procedure and compute an ordered/annotated list of curves by smallest distance to any other curve.
#* After computing the upper half of the matrix, create an ordered list of curve-to-curve distances annotated with the curve pairs.  Distributed [[w:quicksort]]?  [http://www.parallelpython.com/component/option,com_smf/Itemid,29/topic,138.0]
# Build up clusters until some termination condition: satisfactory number of non-singleton clusters, satisfactory median size of non-singleton clusters, etc.  Or just run until you get one huge cluster, but store the binary cluster tree.  It may be really skewed but maybe a tree rebalancing algorithm could help in post-processing.
# Build up clusters until some termination condition: satisfactory number of non-singleton clusters, satisfactory median size of non-singleton clusters, etc.  Or just run until you get one huge cluster, but store the binary cluster tree.  It may be really skewed but maybe a tree rebalancing algorithm could help in post-processing.
#* Initialization: each curve is a singleton cluster.
#* Initialization: each curve is a singleton cluster.
#* A curve's distance to a cluster is the minimum distance to any curve in that cluster.
#* A curve's distance to a cluster is the minimum distance to any curve in that cluster.
#* In each iteration, merge the singleton cluster with lowest minimum curve-to-curve distance to its closest cluster.
#* In each iteration, with lowest minimum curve-to-curve distance to its closest cluster.

Revision as of 00:01, 24 March 2009

Tubegen generates an easy-to-parse .nocr file specifying points on streamlines.

  1. Pick a good dataset (Diffusion_MRI#Collaboration_Table).
  2. Run tubegen on it with modified parameters so it doesn't cull anything---this will result in ~100k curves, with an average of ~70 points per curve.
  3. Write a python script to divide the computation of the curve-to-curve distance matrix among many computers.
    • Try max and mean minimum point-to-curve distance in overlapping region as inter-curve distance measure.
    • The per-curve script should return the assigned matrix line as well as a list of curves sorted by distance and annotated by the distance, for fast clustering.
    • After computing the upper half of the matrix, create an ordered list of curve-to-curve distances annotated with the curve pairs. Distributed w:quicksort? [1]
  4. Build up clusters until some termination condition: satisfactory number of non-singleton clusters, satisfactory median size of non-singleton clusters, etc. Or just run until you get one huge cluster, but store the binary cluster tree. It may be really skewed but maybe a tree rebalancing algorithm could help in post-processing.
    • Initialization: each curve is a singleton cluster.
    • A curve's distance to a cluster is the minimum distance to any curve in that cluster.
    • In each iteration, with lowest minimum curve-to-curve distance to its closest cluster.