User:Jadrian Miles/Thesis proposal feedback
Feedback on my October 2010 thesis proposal included /dhl notes from during the talk, /jfh email after the talk, /dhl meeting notes after Spike's email, and conversations with other committee and faculty members that were not recorded. Feedback on the style and organization of my talk was synthesized into the HOWTO page Give a talk. Feedback on the content of my ideas and their individual presentation is listed on this page. The committee has asked that I formally respond to these comments.
Hello committee-
My written responses to your questions and comments regarding my thesis proposal are below. For the next few months I will be working on a simplified toy problem, as suggested by the entire committee and by Peter especially. Work on the toy problem will proceed as follows:
- Design the macrostructure model: a data structure that describes volumes in space by their bounding surfaces and interior orientation information sufficient to propagate a curve from any given point inside the volume.
- Create a macrostructure model instance that serves as a computational phantom and therefore the ground truth for later work.
- Generate synthetic images from the phantom, and tractography curves from the synthetic images.
- Design the curve-clustering algorithm to generate an initial macrostructure model instance from the synthetic tractography curves. Evaluate this instance relative to the phantom.
- Design the image-based macrostructure adjustment algorithm to refine the initial instance. Evaluate the solution instance relative to the phantom.
The evaluations in steps 4 and 5 will quantify the robustness of the technique over varying noise levels and resolution in the synthetic images from step 3.
Since work on the toy problem is not yet complete, the written responses below are of two kinds: some reflect changes to the thesis proposal document that I have already made, while others anticipate the results of the work on the toy problem. In this second case, the responses are somewhat longer as I have attempted to sketch some ideas that will be tested by working with the toy problem.
- Spike pointed out two issues with the thesis statement. The term "unambiguous" may be meaningless in this context, and the statement that the solution to this system could not be computed using other, single-scale models may be unprovable.
- The thesis statement has been amended as follows: "Combining large- and small-scale structural properties into a single model of the brain white matter admits of a model instance that describes the brain more accurately and in more detail than any current technique."
- Peter expressed a desire for more direct comparison to a greater variety of related work, in particular global macrostructure models such as Gibbs tracking and spin-glass models. Direct comparison to related work of other sorts, including signal regularization schemes and microstructure models, is also needed.
- The various stages of my solution to the “toy problem” will be evaluated against related, established techniques. In particular, tractography models derived from spin-glass and Gibbs-tracking algorithms will be included in the evaluation for Chapter 2. In cases where quantitative comparison between my work and related work is not possible (for example, because a tractography model and a “shrink-wrap” model describe macrostructure geometry in different ways), the specific differences between approaches will be described in prose in each chapter. The related work mentioned in sections 1.4 and 1.5 will be revisited in more detail and more concretely in later chapters, pending work on the toy problem.
- Several commenters felt that the distinction between the geometry-based curve-clustering process and the image-based bundle-adjustment process was insufficiently clear.
- The overview at the end of section 1.1 has been edited to clearly specify the model and/or algorithm explained in each chapter, along with the input and output of each algorithm. This format for describing each step will also be incorporated into future edits of the later chapters. Please refer to this section if necessary to clarify the responses to later comments.
- [dhl: capture talk nature of the problem]
- Spike and others were concerned that the "black-box math" used to describe the curve-clustering algorithm's cluster configuration energy function is insufficiently specific. What is the principled reason for the algorithm to choose a middle ground between 300,000 singleton clusters and one whole-brain cluster?
- The energy function for curve clustering will be made explicit in section 2.2.3, pending work on the toy problem. The tradeoff between bounding surface curvature and the total number of clusters on one hand and curve reconstruction error and normalized cluster cross-sectional area of each cluster on the other will be expressed through this energy function. In the case of 300,000 singleton clusters, bounding surface curvature and the total number of clusters are both large, while the curve reconstruction error and normalized cluster cross-sectional area are at their minimum. In the case of one whole-brain cluster, the surface curvature and number of clusters are very low, yet the curve reconstruction error and normalized cross-section are very high. The regular structure of the white matter of the brain --- which can be observed at a scale larger than individual tractography curves but which displays a large degree of variation and discontinuity over the volume of the whole brain --- indicates that a tradeoff point exists between these two extreme representations, relative to a properly-designed configuration energy function.
- dhl suggests ch. 15 of numerical recipes
- Eugene asked for clarification of the order in which curve clusters are selected for candidate merges, whether the overall clustering algorithm is greedy, and the consequences of different ordering choices and clustering algorithms.
- The development of this algorithm is part of the prospective work on the toy problem, and will be described in section 2.2.4. Simple linear-time heuristics should be sufficient to reduce the set of candidate cluster pairs at each merge step from the full n*(n-1)/2 to a much smaller-constant O(n^2) value. One such heuristic is the centroid distance between existing clusters, which may be computed in linear time per merge. Candidate selection may be made even faster by applying a constant-time heuristic for choosing one cluster of the pair; for example, picking a random cluster or the cluster with minimum individual configuration energy. I believe that simulated annealing will result in better clustering results than a strictly greedy algorithm. Ideally, the clustering energy function would choose a good clustering even under an adversarial ordering of merge candidates, and this behavior will be tested. Different orderings and algorithms will be investigated informally and discussed briefly in section 2.2.4.
- David and Spike both felt that the anatomical assumptions and prior knowledge that are applied to various steps in the macrostructure-fitting process required stronger justifications from the biological literature.
- This is an important criticism with which I agree fully. My experience with the biological literature is limited, and I have been working with David and our clinical collaborators to find concrete explanations for ideas that I understand only informally. The final form of my dissertation will address this issue directly through references to established knowledge of the biology of the brain.
- Chad and others were concerned that the nature of the optimization algorithm for image-based bundle refinement is not specified. How does it relate to established techniques? Ben asked how image differences would be translated into a space of candidate bundle refinements.
- The development of this algorithm is part of the prospective work on the toy problem, and will be described in section 4.2.4. For a given model instance, each image voxel may be associated with a small set of model parameters that might affect it; for example, the local shape parameters of each bundle that is within a certain small radius of the voxel, as well as discrete operations on these bundles such as splitting. This mapping is a rough rasterization of the macrostructure model and should be easy to maintain throughout the optimization. The parameters that must be examined in order to correct an image difference in a given voxel are defined by this mapping. Chains of interdependencies between spatially distant parameters will exist in the mapping, and dealing with this issue to prevent an explosion of computational complexity will be a necessary challenge to overcome while developing this algorithm. Given a sufficiently sparse set of image differences, however, I believe that mostly-independent regions of differences may be identified and treated separately by local search over the neighborhood of adjustments of the relevant parameters.
- The entire committee asked how the algorithm for image-based bundle refinement would avoid overfitting.
- The objective function for this algorithm must express a tradeoff between model complexity and image differences. Bounding surface curvature and the total number of bundles, as in the curve-clustering optimization, measure the model complexity. In addition, the noise characteristics of diffusion MRI are well understood and may be incorporated into this function in the form of a chi-squared goodness-of-fit statistic, so that overfitting may be directly measured and selected against.
- Peter asked how the model will accommodate white matter fascicles that terminate outside of the brain.
- The image volume outside the brain is represented in the macrostructure model by a fictitious tissue type for which the configuration and image-difference energy are defined to be zero. The geometry of this volume is not adjusted by the image-based adjustment algorithm. The purpose of this additional element in the model is to define feasible white matter bundles as those that terminate either at grey matter or outside of the brain.
- Spike asked how the modeling system's macrostructure reconstruction will be validated, whether it would be stable across subjects and acquisitions, and whether it would correspond reasonably (perhaps in a subset relationship) with anatomists' conception of the macrostructure.
- The toy problem will provide ground truth for validation and a means for evaluating the stability of the technique. I do not believe that a formal representation of anatomists’ conception of the macrostructure is appropriate to include in the fitting process, as one of the motivations of this work is the great empirically observed variation in brain macrostructure between subjects. In the absence of such a representation, the only way to guarantee a subset relationship is by performing no curve clustering at all, but this is also not a desirable outcome, as it sacrifices all the beneficial computational and conceptual properties of a higher-level macrostructure representation. A subset relationship is, nevertheless, a desirable property of the final output, and it may be possible to tune the optimization steps’ energy functions (by allowing for more model complexity and especially a greater number of bundles) to make this more likely.
- Spike asked whether the fitting process would be idempotent, up to variation due to noise. If not, would chained applications of the process converge quickly? If not, why not? If so, what is the nature of the fixed-point solution?
- This question will be investigated with the toy model and fitting process. Idempotence is a desirable property but a fixed-point solution to the system may not exist at all, due to nondeterminism in the fitting process. It may be reasonable to define the algorithms at first nondeterministically, demonstrate that they generate good solutions, and then refine them to achieve idempotence or quick convergence in chained applications while preserving the quality of the solutions.
- David mentioned that the stated choice of a Rician distribution for axon diameters seems inappropriate.
- Three properties of empirical axon diameter distributions inform the choice of a model distribution: 1) axon diameters less than or equal to zero are impossible; 2) there is no a priori maximum axon diameter, though observed distributions decrease smoothly with increasing diameter; 3) observed distributions decrease smoothly as the diameter approaches zero. These observations rule out the majority of common probability distributions: those with support over the entire real line, those with nonzero density at zero diameter, and those with finite support for which extreme parameter values give a very large or infinite second derivative (that is, a probability “spike”) at their greater support boundary. The common distributions with few parameters that remain include the Rician distribution and the Lévy distribution. The latter of these, though possessing the appealing property of stability, has a heavy tail that does not correspond to empirical observations of axon diameters. The Rician distribution, on the other hand, approaches a normal distribution as the ratio of its center parameter to its width parameter grows large, which is an appealing property that also corresponds to observations. Therefore the Rician distribution appears to be a good choice to model axon diameters.
- Ben and others expressed concern about the feasibility of the proposed research schedule.
- Provided that the work with the toy problem is successful, evaluation with real data may be de-emphasized in earlier chapters, instead delaying thorough evaluation of the entire model-fitting system on real data until Chapter 7. Chapters 5 and 6, which describe the application of microstructure fitting priors in the bounded regions described by the macrostructure model, may be eliminated as distinct research contributions and attempted publications. Instead the core idea of using smoothness priors to give a unique solution to an otherwise underconstrained microstructure model may be introduced only in Chapter 7, or it may be abandoned altogether. I believe that a dissertation consisting only of Chapters 1–4 would represent a significant research contribution to my field.