User:Jadrian Miles/Thesis manifesto: probabilistic worldview: Difference between revisions

From VrlWiki
Jump to navigation Jump to search
Jadrian Miles (talk | contribs)
Microstructure Fitting: error in my reasoning; new derivation
Jadrian Miles (talk | contribs)
Line 22: Line 22:
where <math>F(\psi,\nu)</math> is the [[w:Chi-squared distribution#Cumulative distribution function|CDF of the <math>\chi^2</math> distribution]].  Here's how to interpret <math>Q</math>:
where <math>F(\psi,\nu)</math> is the [[w:Chi-squared distribution#Cumulative distribution function|CDF of the <math>\chi^2</math> distribution]].  Here's how to interpret <math>Q</math>:
* Really low (<math>Q < 0.001</math>) means that you can't do a much worse job describing the data than you did with your model instance.  This means either that your fit is bad (<math>\chi^2</math> is really high) or that your model has so many degrees of freedom (and thus <math>\nu</math> is so small) that even a low <math>\chi^2</math> isn't convincing.
* Really low (<math>Q < 0.001</math>) means that you can't do a much worse job describing the data than you did with your model instance.  This means either that your fit is bad (<math>\chi^2</math> is really high) or that your model has so many degrees of freedom (and thus <math>\nu</math> is so small) that even a low <math>\chi^2</math> isn't convincing.
* Really high (<math>Q \approx 1</math>) means that you can't do much better, even given the number of degrees of freedom.  This is a suspicious result and may indicate data-fudging, so it shouldn't come up in an automated system.
* Really high (<math>Q \approx 1</math>) means that you can't do much better, even given the number of degrees of freedom.  This is a suspicious result and may indicate data-fudging or just overestimation of the <math>\sigma_i</math>s, so it shouldn't come up in an automated system.


This means that maximizing <math>Q</math> over all values of <math>M</math> (from very simple to very complex models) should find the "happy medium".
This means that maximizing <math>Q</math> over all values of <math>M</math> (from very simple to very complex models) should find the "happy medium".

Revision as of 16:57, 6 March 2011

My dissertation involves solving three problems:

  1. Automatically clustering tractography curves together so that the resulting clusters are neither too small (low reconstruction error, high model complexity) nor too big (high reconstruction error, low model complexity)
  2. Automatically adjusting macrostructure elements to match input DWIs so that the elements' surfaces are neither too bumpy (low reconstruction error, high model complexity) nor too smooth (high reconstruction error, low model complexity)
  3. Automatically adjusting microstructure properties within a given region in space to match input DWIs so that the spatial frequency of the microstructure parameters is neither too high (low reconstruction error, high model complexity) nor too low (high reconstruction error, low model complexity)

In each case, our goal is to balance the tradeoff between reconstruction error and model complexity. In order to do so in a principled fashion, we must define, for each case, what is meant by reconstruction error and model complexity. What probabilistic tools are available for us to go about this?

χ2 Fitting

One apparent option is χ2 fitting. Given:

  • N observations yi at independent variables xi
  • The knowledge that the measurement error on each observation is Gaussian with standard deviation σi
  • A model instance with M parameters aj, which gives a reconstructed "observation" for a given independent variable as y(xi|a0aM1)

we define the χ2 statistic as follows:

χ2=i=0N1(yiy(xi|a0aM1)σi)2

This is essentially a normalized error score, and in itself tells us nothing about goodness of fit. The interesting thing that we can do here, though, is compute the probability of seeing an error as bad or worse than the observed error score under a different set of observations drawn from the same underlying distribution, given the degrees of freedom νNM. This probability is called Q, and is defined as:

Q=1χ2F(ψ,ν)dψ

where F(ψ,ν) is the CDF of the χ2 distribution. Here's how to interpret Q:

  • Really low (Q<0.001) means that you can't do a much worse job describing the data than you did with your model instance. This means either that your fit is bad (χ2 is really high) or that your model has so many degrees of freedom (and thus ν is so small) that even a low χ2 isn't convincing.
  • Really high (Q1) means that you can't do much better, even given the number of degrees of freedom. This is a suspicious result and may indicate data-fudging or just overestimation of the σis, so it shouldn't come up in an automated system.

This means that maximizing Q over all values of M (from very simple to very complex models) should find the "happy medium".

Let's consider the meaning of all these variables for each of the problems defined above:

Curve Clustering

  • N is the number of vertices in all the curves in the tractography set. Each xi therefore represents the selection of a single vertex (by two parameters: the index of the curve in the curve set, and the index or arc-length distance of the desired vertex along this curve). yi isn't particularly well-defined in this case (see below), but amounts to the description of the curve near xi: the vertex's position and the angle between the consecutive segments, maybe.
  • M is the total number of curves that form the skeletons of the shrink-wrap polyhedra for the model, plus the total number of bundles/polyhedra. Note that even in the most "complicated" model, M is smaller than N by a factor of half the average number of xis per curve. y(xi|a0aM1) is the description of the reconstructed curve specified by xi at the specified position along it, given the association of curves with bundles specified by the model parameter values a0aM1.
  • σi is defined according to the definition of the difference (yiy(xi|a0aM1)) that we choose (see below).

Let's assume that for comparing the reconstruction (y(xi|a0aM1), which we will just call y) with the observation (yi), we want to incorporate both a position and angle reconstruction error. Then we must define some sort of scalar pseudo-difference function

f(y,yi)=α×dp(y,yi)+(1α)×dθ(y,yi)

where dp is the difference in positions (Euclidean distance) between the vertices, and dθ is some difference between angles, perhaps one minus the dot product. α may either be hand-tuned or solved for in the course of the optimization; in this latter case, M must be increased by one.

The σis must be determined experimentally (with synthetic phantoms, for example). It remains for these experiments to demonstrate that the errors are normally distributed. If not, some wacky transform might be necessary to shoehorn it into the χ2 regime, or Monte-Carlo simulation of the error distribution might allow for a different but equivalent approach (see §15.6 of NR).

Bundle Adjustment

Microstructure Fitting

  • N is the number of voxels in the relevant region multiplied by the number of observations (diffusion weightings) in each. Each individual configuration (voxel and diffusion weighting) is an xi. The observed intensity for each xi is yi.
  • M is the number of control parameters in the space-filling microstructure model. Assuming that the microstructure instance for each voxel is determined by some sort of spline with sparse control points, M is 3 parameters for position plus the number of parameters actually controlling the microstructure model, multiplied by the number of control points. y(xi|a0aM1) is the reconstructed signal intensity in the voxel and diffusion weighting specified by xi, given the model parameter values a0aM1.
  • σi is known for all i by estimation of the Rician noise parameters from the DWIs. But...

Unfortunately, we know that the noise on the yis is not Gaussian but rather Rician. Therefore (yiy(xi|a)) is not normally-distributed, and so the χ2 statistic is invalid in the form above. We can, however, create a "fake" observation y~i from yi for which (y~iy(xi|a)) is normally distributed. We do this by solving for the y~i that satisfies the following equality:

yi(yi|y)dyi=y~i(y~i|y)dy~i

The two sides of the equality are values of the cumulative distribution functions of, respectively, a Rician random variable and a gaussian random variable:

1Q1(yσi,yiσi)=12(1+erf(y~iy2σi2))

where Q1 is the first-order Marcum Q-function, and erf is the Gauss error function. Solving, we find:

y~i=y+2σi2erf1(12Q1(yσi,yiσi))

This function re-maps a Rician random variable yi into a Gaussian random variable y~i with the same σi and central value y, and so we may use this y~i in place of yi in the summation for the χ2 statistic.

The inner loop of our algorithm is given a model with M degrees of freedom, and fits the ajs to maximize Q (or equivalently, minimize χ2) for that model (see §15.5 of NR). The outer loop is a search over all models (some of which may have the same M) to find a global maximum in Q. As mentioned above, we should never get an unrealistically high Q value, as that's really only achievable by either fudging the data or overestimating the σis.