User:Jadrian Miles/Thesis manifesto: probabilistic worldview: Difference between revisions

Revision as of 03:58, 5 March 2011

My dissertation involves solving three problems:

Automatically clustering tractography curves together so that the resulting clusters are neither too small (low reconstruction error, high model complexity) nor too big (high reconstruction error, low model complexity)
Automatically adjusting macrostructure elements to match input DWIs so that the elements' surfaces are neither too bumpy (low reconstruction error, high model complexity) nor too smooth (high reconstruction error, low model complexity)
Automatically adjusting microstructure properties within a given region in space to match input DWIs so that the spatial frequency of the microstructure parameters is neither too high (low reconstruction error, high model complexity) nor too low (high reconstruction error, low model complexity)

In each case, our goal is to balance the tradeoff between reconstruction error and model complexity. In order to do so in a principled fashion, we must define, for each case, what is meant by reconstruction error and model complexity.

$χ^{2}$ Fitting

One apparent option is $χ^{2}$ fitting. Given:

$N$ observations $y_{i}$ at independent variables $x_{i}$
The knowledge that the measurement error on each observation is Gaussian with standard deviation $σ_{i}$
A model instance with $M$ parameters $a_{j}$ , which gives a reconstructed "observation" for a given independent variable as $y (x_{i} | a_{0} \dots a_{M - 1})$

we define the $χ^{2}$ statistic as follows:

$χ^{2} = \sum_{i = 0}^{N - 1} {(\frac{y_{i} - y (x_{i} | a_{0} \dots a_{M - 1})}{σ_{i}})}^{2}$

This is essentially a normalized error score. The interesting thing that we can do here, though, is compute the probability of seeing an error as bad or worse than the observed error score under a different set of observations drawn from the same underlying distribution, given the degrees of freedom $ν \equiv N - M$ . This probability is called $Q$ , and is defined as:

$Q = 1 - \int_{χ^{2}}^{\infty} F (ψ, ν) d ψ$

where $F (ψ, ν)$ is the CDF of the $χ^{2}$ distribution. Here's how to interpret $Q$ :

Really low ( $Q < 0.001$ ) means that you can't do a much worse job describing the data than you did with your model instance. This means either that your fit is bad ( $χ^{2}$ is really high) or that you have so many degrees of freedom that even a low $χ^{2}$ isn't convincing.
Really high ( $Q \approx 1$ ) means that you can't do much better, even given the number of degrees of freedom. This is a suspicious result and may indicate data-fudging, so it shouldn't come up in an automated system.

This means that maximizing $Q$ over all values of $M$ (from very simple to very complex models) should find the "happy medium".

Let's consider the meaning of all these variables for each of the problems defined above:

Curve Clustering

Bundle Adjustment

Microstructure Fitting

$N$ is the number of voxels in the relevant region multiplied by the number of observations (diffusion weightings) in each. Each individual configuration (voxel and diffusion weighting) is an $x_{i}$ . The observed intensity for each $x_{i}$ is $y_{i}$ .
$M$ is the number of control parameters in the space-filling microstructure model. Assuming that the microstructure instance for each voxel is determined by some sort of spline with sparse control points, $M$ is 3 parameters for position plus the number of parameters actually controlling the microstructure model, multiplied by the number of control points. $y (x_{i} | a_{0} \dots a_{M - 1})$ is the reconstructed signal intensity in the voxel and diffusion weighting specified by $x_{i}$ , given the model parameter values $a_{0} \dots a_{M - 1}$ .
$σ_{i}$ is known for all $i$ by estimation of the Rician noise parameters from the DWIs. But...

Unfortunately, we know that the noise on the $y_{i}$ s is not Gaussian but rather Rician. Therefore $(y_{i} - y (x_{i} | a))$ is not normally-distributed, and so the $χ^{2}$ statistic is invalid in the form above. We can, however, create a "fake" observation ${\tilde{y}}_{i}$ from $y_{i}$ for which $({\tilde{y}}_{i} - y (x_{i} | a))$ is normally distributed.

Let us consider a single observation in isolation. The observed signal magnitude is $y_{i}$ and the reconstructed value from the model is $y$ . Assuming $y$ is a correct reconstruction, the marginal probability of observing $y_{i}$ is:

$ℙ (y_{i} | y) d y_{i} = \frac{1}{2 π σ_{i}^{2}} e x p (- (y^{2} + y_{i}^{2}) / 2 σ_{i}^{2}) \int_{- π}^{π} e x p ((y y_{i} / σ_{i}^{2}) c o s θ) d θ d y_{i}$

(Clearly the same probability results if we instead reverse the roles of $y$ and $y_{i}$ .)

We wish to define a normally-distributed "fake observation" ${\tilde{y}}_{i}$ with the same probability:

$ℙ ({\tilde{y}}_{i} | y) d {\tilde{y}}_{i} = \frac{1}{\sqrt{2 π σ_{i}^{2}}} e x p (- ({\tilde{y}}_{i} - y)^{2} / 2 σ_{i}^{2}) d {\tilde{y}}_{i} = ℙ (y_{i} | y) d y_{i}$

Allowing $d {\tilde{y}}_{i} = d y_{i}$ , we arrive after some algebraic manipulation at:

$({\tilde{y}}_{i} - y)^{2} = y^{2} + y_{i}^{2} - 2 σ_{i}^{2} (\int_{- π}^{π} e x p ((y y_{i} / σ_{i}^{2}) c o s θ) d θ - \frac{1}{2} l o g (2 π σ_{i}^{2}))$

This is just the numerator of the $i$ -th term in the summation for the $χ^{2}$ statistic comparing the model reconstruction $y$ to the (fake) observation $\tilde{y}$ . The value of the integral depends on only one variable, $m \equiv y y_{i} / σ_{i}^{2}$ , so it may be tabulated in a lookup table for fast computation.

The inner loop of our algorithm is given a model with $M$ degrees of freedom, and fits the $a_{j}$ s to maximize $Q$ (or equivalently, minimize $χ^{2}$ ) for that model. The outer loop is a search over all models (some of which may have the same $M$ ) to find a global maximum in $Q$ . As mentioned above, we should never get an unrealistically high $Q$ value, as that's really only achievable by either fudging the data or overestimating the $σ_{i}$ s.

@@ Line 21: / Line 21: @@
 where <math>F(\psi,\nu)</math> is the [[w:Chi-squared distribution#Cumulative distribution function|CDF of the <math>\chi^2</math> distribution]].  Here's how to interpret <math>Q</math>:
-* Really low (<math>Q < 0.001</math>) means that you can't do a much worse job describing the data than you did with your model instance.  This means either that your fit is bad (<math>\chi^2</math> is really high) or that you have too many degrees of freedom.
+* Really low (<math>Q < 0.001</math>) means that you can't do a much worse job describing the data than you did with your model instance.  This means either that your fit is bad (<math>\chi^2</math> is really high) or that you have so many degrees of freedom that even a low <math>\chi^2</math> isn't convincing.
 * Really high (<math>Q \approx 1</math>) means that you can't do much better, even given the number of degrees of freedom.  This is a suspicious result and may indicate data-fudging, so it shouldn't come up in an automated system.
 This means that maximizing <math>Q</math> over all values of <math>M</math> (from very simple to very complex models) should find the "happy medium".
+Let's consider the meaning of all these variables for each of the problems defined above:
+====Curve Clustering====
+====Bundle Adjustment====
+====Microstructure Fitting====
+* <math>N</math> is the number of voxels in the relevant region multiplied by the number of observations (diffusion weightings) in each. Each individual configuration (voxel and diffusion weighting) is an <math>x_i</math>. The observed intensity for each <math>x_i</math> is <math>y_i</math>.
+* <math>M</math> is the number of control parameters in the space-filling microstructure model. Assuming that the microstructure instance for each voxel is determined by some sort of spline with sparse control points, <math>M</math> is 3 parameters for position plus the number of parameters actually controlling the microstructure model, multiplied by the number of control points.  <math>y(x_i|a_0 \dots a_{M-1})</math> is the reconstructed signal intensity in the voxel and diffusion weighting specified by <math>x_i</math>, given the model parameter values <math>a_0 \dots a_{M-1}</math>.
+* <math>\sigma_i</math> is known for all <math>i</math> by estimation of the Rician noise parameters from the DWIs.  But...
+Unfortunately, we know that the noise on the <math>y_i</math>s is not Gaussian but rather Rician.  Therefore <math>(y_i-y(x_i|a))</math> is not normally-distributed, and so the <math>\chi^2</math> statistic is invalid in the form above.  We can, however, create a "fake" observation <math>\tilde{y}_i</math> from <math>y_i</math> for which <math>(\tilde{y}_i-y(x_i|a))</math> is normally distributed.
+Let us consider a single observation in isolation.  The observed signal magnitude is <math>y_i</math> and the reconstructed value from the model is <math>y</math>.  Assuming <math>y</math> is a correct reconstruction, the marginal probability of observing <math>y_i</math> is:
+<math>\mathbb{P}(y_i|y)\,\mathrm{d}y_i = \frac{1}{2\pi\sigma_i^2}\,\mathrm{exp}\!\left(-(y^2+y_i^2)/2\sigma_i^2\right) \int_{-\pi}^{\pi}\mathrm{exp}\!\left((yy_i/\sigma_i^2)\,\mathrm{cos}\,\theta\right)\,\mathrm{d}\theta\,\mathrm{d}y_i</math>
+(Clearly the same probability results if we instead reverse the roles of <math>y</math> and <math>y_i</math>.)
+We wish to define a normally-distributed "fake observation" <math>\tilde{y}_i</math> with the same probability:
+<math>\mathbb{P}(\tilde{y}_i|y)\,\mathrm{d}\tilde{y}_i = \frac{1}{\sqrt{2\pi\sigma_i^2}}\,\mathrm{exp}\!\left(-(\tilde{y}_i-y)^2/2\sigma_i^2\right)\mathrm{d}\tilde{y}_i = \mathbb{P}(y_i|y)\,\mathrm{d}y_i</math>
+Allowing <math>\mathrm{d}\tilde{y}_i = \mathrm{d}y_i</math>, we arrive after some algebraic manipulation at:
+<math>(\tilde{y}_i-y)^2 = y^2 + y_i^2 - 2\sigma_i^2 \left(\int_{-\pi}^{\pi}\mathrm{exp}\!\left((yy_i/\sigma_i^2)\,\mathrm{cos}\,\theta\right)\,\mathrm{d}\theta - \frac{1}{2}\mathrm{log}(2\pi\sigma_i^2)\right)</math>
+This is just the numerator of the <math>i</math>-th term in the summation for the <math>\chi^2</math> statistic comparing the model reconstruction <math>y</math> to the (fake) observation <math>\tilde{y}</math>.  The value of the integral depends on only one variable, <math>m \equiv yy_i/\sigma_i^2</math>, so it may be tabulated in a lookup table for fast computation.
+The inner loop of our algorithm is given a model with <math>M</math> degrees of freedom, and fits the <math>a_j</math>s to maximize <math>Q</math> (or equivalently, minimize <math>\chi^2</math>) for that model.  The outer loop is a search over all models (some of which may have the same <math>M</math>) to find a global maximum in <math>Q</math>.  As mentioned above, we should never get an unrealistically high <math>Q</math> value, as that's really only achievable by either fudging the data or overestimating the <math>\sigma_i</math>s.

User:Jadrian Miles/Thesis manifesto: probabilistic worldview: Difference between revisions

Revision as of 03:58, 5 March 2011

Contents

$χ^{2}$ Fitting

Curve Clustering

Bundle Adjustment

Microstructure Fitting

Navigation menu

User:Jadrian Miles/Thesis manifesto: probabilistic worldview: Difference between revisions

Revision as of 03:58, 5 March 2011

χ2 Fitting

Curve Clustering

Bundle Adjustment

Microstructure Fitting

Navigation menu

Search

$χ^{2}$ Fitting