User:Jadrian Miles/Thesis manifesto: probabilistic worldview: Difference between revisions

Revision as of 17:46, 20 January 2012

boopboopboopMy dissertation involves solving three problems: boopboopboop# Automatically clustering tractography curves together so that the resulting clusters are neither too small (low reconstruction error, high model complexity) nor too big (high reconstruction error, low model complexity) boopboopboop# Automatically adjusting macrostructure elements to match input DWIs so that the elements' surfaces are neither too bumpy (low reconstruction error, high model complexity) nor too smooth (high reconstruction error, low model complexity) boopboopboop# Automatically adjusting microstructure properties within a given region in space to match input DWIs so that the spatial frequency of the microstructure parameters is neither too high (low reconstruction error, high model complexity) nor too low (high reconstruction error, low model complexity) boopboopboop boopboopboopIn each case, our goal is to balance the tradeoff between reconstruction error and model complexity. In order to do so in a principled fashion, we must define, for each case, what is meant by reconstruction error and model complexity. What probabilistic tools are available for us to go about this? boopboopboop boopboopboop== $χ^{2}$ Fitting== boopboopboop boopboopboopOne apparent option is $χ^{2}$ fitting. Given: boopboopboop* $N$ observations $y_{i}$ at independent variables $x_{i}$ boopboopboop* The knowledge that the measurement error on each observation is Gaussian with standard deviation $σ_{i}$ boopboopboop* A model instance with $M$ parameters $a_{j}$ , which gives a reconstructed "observation" for a given independent variable as $y (x_{i} | a_{0} \dots a_{M - 1})$ boopboopboopwe define the $χ^{2}$ statistic as follows: boopboopboop boopboopboop $χ^{2} = \sum_{i = 0}^{N - 1} {(\frac{y_{i} - y (x_{i} | a_{0} \dots a_{M - 1})}{σ_{i}})}^{2}$ boopboopboop boopboopboopThis is essentially a normalized error score, and in itself tells us nothing about goodness of fit. We have some sense that a small $χ^{2}$ means a good fit and a big one means a bad fit, but how small and how big? How good and how bad? A $χ^{2}$ statistic must be interpreted in the context of the relative sizes of $N$ and $M$ . Obviously, if you allow $M = N$ , then you have as many degrees of freedom in your model as there are observations to test its fit, and so it's trivial to get $χ^{2} = 0$ in this case. boopboopboop boopboopboopImagine that we knew the true, measurement-error-free value of every observation; let's call that ${\bar{y}}_{i}$ . Imagine also that our reckoning of the measurement error distributions (that is, that each one is Gaussian with mean ${\bar{y}}_{i}$ and standard deviation $σ_{i}$ ) is exactly correct. Then each noisy observation $y_{i}$ is made from a Gaussian distribution, and we can normalize this observation to a z-score (standard normal variate) by $z_{i} = (y_{i} - {\bar{y}}_{i}) / σ_{i}$ . Note, then, that the sum of the squares of all the $z_{i}$ is a $χ^{2}$ statistic. The sum of the squares of $k$ independent standard normal random variables itself forms a probability distribution, and this distribution has a name: the $χ^{2}$ distribution with $k$ degrees of freedom. boopboopboop boopboopboopNow flip this concept around: rather than knowing true, error-free ${\bar{y}}_{i}$ values and talking about the probability distribution of the $y_{i}$ values, what we know is the $y_{i}$ s and we want to evaluate the probability that our model has estimated the ${\bar{y}}_{i}$ s correctly. The sum of the squares of the normalized differences between $y (x_{i} | a)$ and $y_{i}$ should follow the same $χ^{2}$ distribution as above, if the model estimated ${\bar{y}}_{i}$ correctly. For every degree of freedom in our model ( $M$ ), though, we have to eliminate one degree of freedom (that is, consideration of one observation) from the $χ^{2}$ distribution, since theoretically we could've devoted that one degree of freedom to exactly reproducing one noisy observation that we made. boopboopboop boopboopboopWhat we've arrived at is $χ^{2}$ model selection. Given a value $χ^{2}$ as computed above, the probability $Q$ of seeing an error as bad or worse than the observed $χ^{2}$ under a different set of observations drawn from the same underlying distribution, given the degrees of freedom $ν \equiv N - M$ , is: boopboopboop boopboopboop $Q (χ^{2} | ν) = \int_{χ^{2}}^{\infty} f (ψ, ν) d ψ = 1 - F (χ^{2} | ν)$ boopboopboop boopboopboopwhere $f (ψ | ν)$ is the PDF of the $χ^{2}$ distribution and $F (ψ | ν)$ is of course its CDF. Here's how to interpret $Q$ : boopboopboop* Really low ( $Q < 0.001$ ) means that you can't do a much worse job describing the data than you did with your model instance. This means either that your fit is bad ( $χ^{2}$ is really high) or that your model has so many degrees of freedom (and thus $ν$ is so small) that even a low $χ^{2}$ isn't convincing. boopboopboop* Really high ( $Q \approx 1$ ) means that it's almost impossible to get a better fit to the observations, even given the number of degrees of freedom. High values are actually sort of suspicious, and either indicate that the $σ_{i}$ s were overestimated, $M$ was underestimated, or the data were actually fudged to get a better result. boopboopboop boopboopboopThis means that maximizing $Q$ over all values of $M$ (from very simple to very complex models) should find the "happy medium" model that maximizes explanatory power (low $χ^{2}$ ) while minimizing model complexity (low $M$ ). boopboopboop boopboopboopLet's consider the meaning of all these variables for each of the problems defined above: boopboopboop boopboopboop===Curve Clustering=== boopboopboop* $N$ is the number of vertices in all the curves in the tractography set. Each $x_{i}$ therefore represents the selection of a single vertex (by two parameters: the index of the curve in the curve set, and the index or arc-length distance of the desired vertex along this curve). $y_{i}$ isn't particularly well-defined in this case (see below), but amounts to the description of the curve near $x_{i}$ : the vertex's position and the angle between the consecutive segments, maybe. boopboopboop* $M$ is the total number of curves that form the skeletons of the shrink-wrap polyhedra for the model, plus the total number of bundles/polyhedra. Note that even in the most "complicated" model, $M$ is smaller than $N$ by a factor of half the average number of $x_{i}$ s per curve. $y (x_{i} | a_{0} \dots a_{M - 1})$ is the description of the reconstructed curve specified by $x_{i}$ at the specified position along it, given the association of curves with bundles specified by the model parameter values $a_{0} \dots a_{M - 1}$ . boopboopboop* $σ_{i}$ is defined according to the definition of the difference $(y_{i} - y (x_{i} | a_{0} \dots a_{M - 1}))$ that we choose (see below). boopboopboop boopboopboopLet's assume that for comparing the reconstruction ( $y (x_{i} | a_{0} \dots a_{M - 1})$ , which we will just call $y$ ) with the observation ( $y_{i}$ ), we want to incorporate both a position and angle reconstruction error. Then we must define some sort of scalar pseudo-difference function boopboopboop boopboopboop $f (y, y_{i}) = α \times d_{p} (y, y_{i}) + (1 - α) \times d_{θ} (y, y_{i})$ boopboopboop boopboopboopwhere $d_{p}$ is the difference in positions (Euclidean distance) between the vertices, and $d_{θ}$ is some difference between angles, perhaps one minus the dot product. $α$ may either be hand-tuned or solved for in the course of the optimization; in this latter case, $M$ must be increased by one. boopboopboop boopboopboopThe $σ_{i}$ s must be determined experimentally (with synthetic phantoms, for example). It remains for these experiments to demonstrate that the errors are normally distributed. If not, some wacky transform might be necessary to shoehorn it into the $χ^{2}$ regime, or Monte-Carlo simulation of the error distribution might allow for a different but equivalent approach (see §15.6 of NR). boopboopboop boopboopboop===Bundle Adjustment=== boopboopboop boopboopboop boopboopboop===Microstructure Fitting=== boopboopboop* $N$ is the number of voxels in the relevant region multiplied by the number of observations (diffusion weightings) in each. Each individual configuration (voxel and diffusion weighting) is an $x_{i}$ . The observed intensity for each $x_{i}$ is $y_{i}$ . boopboopboop* $M$ is the number of control parameters in the space-filling microstructure model. Assuming that the microstructure instance for each voxel is determined by some sort of spline with sparse control points, $M$ is 3 parameters for position plus the number of parameters actually controlling the microstructure model, multiplied by the number of control points. $y (x_{i} | a_{0} \dots a_{M - 1})$ is the reconstructed signal intensity in the voxel and diffusion weighting specified by $x_{i}$ , given the model parameter values $a_{0} \dots a_{M - 1}$ . boopboopboop* $σ_{i}$ is known for all $i$ by estimation of the Rician noise parameters from the DWIs. But... boopboopboop boopboopboopUnfortunately, we know that the noise on the $y_{i}$ s is not Gaussian but rather Rician. Therefore $(y_{i} - y (x_{i} | a))$ is not normally-distributed, and so the $χ^{2}$ statistic is invalid in the form above. We can, however, create a "fake" observation ${\tilde{y}}_{i}$ from $y_{i}$ for which $({\tilde{y}}_{i} - y (x_{i} | a))$ is normally distributed. We do this by solving for the ${\tilde{y}}_{i}$ that satisfies the following equality: boopboopboop boopboopboop $\int_{- \infty}^{y_{i}} ℙ_{r i c e} (x | y) d x = \int_{- \infty}^{{\tilde{y}}_{i}} ℙ_{g a u s s} (x | y) d x$ boopboopboop boopboopboopThe two sides of the equality are values of the cumulative distribution functions of, respectively, a Rician random variable and a Gaussian random variable: boopboopboop boopboopboop $1 - Q_{1} (\frac{y}{σ_{i}}, \frac{y_{i}}{σ_{i}}) = \frac{1}{2} (1 + e r f (\frac{{\tilde{y}}_{i} - y}{\sqrt{2 σ_{i}^{2}}}))$ boopboopboop boopboopboopwhere $Q_{1}$ is the first-order Marcum Q-function^{[note 1]}, and $e r f$ is the Gauss error function. Solving, we find: boopboopboop boopboopboop ${\tilde{y}}_{i} = y + \sqrt{2 σ_{i}^{2}} {e r f}^{- 1} (1 - 2 Q_{1} (\frac{y}{σ_{i}}, \frac{y_{i}}{σ_{i}}))$ boopboopboop boopboopboopThis function re-maps a Rician random variable $y_{i}$ into a Gaussian random variable ${\tilde{y}}_{i}$ with the same $σ_{i}$ and central value $y$ , and so we may use this ${\tilde{y}}_{i}$ in place of $y_{i}$ in the summation for the $χ^{2}$ statistic. boopboopboop boopboopboopThe inner loop of our algorithm is given a model with $M$ degrees of freedom, and fits the $a_{j}$ s to maximize $Q$ (or equivalently, minimize $χ^{2}$ ) for that model (see §15.5 of NR). The outer loop is a search over all models (some of which may have the same $M$ ) to find a global maximum in $Q$ . As mentioned above, we should never get an unrealistically high $Q$ value, as that's really only achievable by either fudging the data or overestimating the $σ_{i}$ s. boopboopboop boopboopboop= Notes =

boopboopboop

↑ References for the Marcum Q-function: Wikipedia, Mathworld, Matlab

[1] References for the Marcum Q-function: Wikipedia, Mathworld, Matlab

[note 1]

@@ Line 1: / Line 1: @@
-My dissertation involves solving three problems:
-# Automatically clustering tractography curves together so that the resulting clusters are neither too small (low reconstruction error, high model complexity) nor too big (high reconstruction error, low model complexity)
-# Automatically adjusting macrostructure elements to match input DWIs so that the elements' surfaces are neither too bumpy (low reconstruction error, high model complexity) nor too smooth (high reconstruction error, low model complexity)
-# Automatically adjusting microstructure properties within a given region in space to match input DWIs so that the spatial frequency of the microstructure parameters is neither too high (low reconstruction error, high model complexity) nor too low (high reconstruction error, low model complexity)
-In each case, our goal is to balance the tradeoff between reconstruction error and model complexity.  In order to do so in a principled fashion, we must define, for each case, what is meant by reconstruction error and model complexity.  What probabilistic tools are available for us to go about this?
+boopboopboopMy dissertation involves solving three problems:
+boopboopboop# Automatically clustering tractography curves together so that the resulting clusters are neither too small (low reconstruction error, high model complexity) nor too big (high reconstruction error, low model complexity)
-==<math>\chi^2</math> Fitting==
+boopboopboop# Automatically adjusting macrostructure elements to match input DWIs so that the elements' surfaces are neither too bumpy (low reconstruction error, high model complexity) nor too smooth (high reconstruction error, low model complexity)
+boopboopboop# Automatically adjusting microstructure properties within a given region in space to match input DWIs so that the spatial frequency of the microstructure parameters is neither too high (low reconstruction error, high model complexity) nor too low (high reconstruction error, low model complexity)
-One apparent option is <math>\chi^2</math> fitting.  Given:
+boopboopboop
-* <math>N</math> observations <math>y_i</math> at independent variables <math>x_i</math>
+boopboopboopIn each case, our goal is to balance the tradeoff between reconstruction error and model complexity.  In order to do so in a principled fashion, we must define, for each case, what is meant by reconstruction error and model complexity.  What probabilistic tools are available for us to go about this?
-* The knowledge that the measurement error on each observation is Gaussian with standard deviation <math>\sigma_i</math>
+boopboopboop
-* A model instance with <math>M</math> parameters <math>a_j</math>, which gives a reconstructed "observation" for a given independent variable as <math>y(x_i|a_0 \dots a_{M-1})</math>
+boopboopboop==<math>\chi^2</math> Fitting==
-we define the <math>\chi^2</math> statistic as follows:
+boopboopboop
+boopboopboopOne apparent option is <math>\chi^2</math> fitting.  Given:
-<math>\chi^2 = \sum_{i=0}^{N-1}\left(\frac{y_i - y(x_i|a_0 \dots a_{M-1})}{\sigma_i}\right)^2</math>
+boopboopboop* <math>N</math> observations <math>y_i</math> at independent variables <math>x_i</math>
+boopboopboop* The knowledge that the measurement error on each observation is Gaussian with standard deviation <math>\sigma_i</math>
-This is essentially a normalized error score, and in itself tells us nothing about goodness of fit.  We have some sense that a small <math>\chi^2</math> means a good fit and a big one means a bad fit, but how small and how big?  How good and how bad?  A <math>\chi^2</math> statistic must be interpreted in the context of the relative sizes of <math>N</math> and <math>M</math>.  Obviously, if you allow <math>M=N</math>, then you have as many degrees of freedom in your model as there are observations to test its fit, and so it's trivial to get <math>\chi^2 = 0</math> in this case.
+boopboopboop* A model instance with <math>M</math> parameters <math>a_j</math>, which gives a reconstructed "observation" for a given independent variable as <math>y(x_i|a_0 \dots a_{M-1})</math>
+boopboopboopwe define the <math>\chi^2</math> statistic as follows:
-Imagine that we knew the true, measurement-error-free value of every observation; let's call that <math>\bar{y}_i</math>.  Imagine also that our reckoning of the measurement error distributions (that is, that each one is Gaussian with mean <math>\bar{y}_i</math> and standard deviation <math>\sigma_i</math>) is exactly correct.  Then each noisy observation <math>y_i</math> is made from a Gaussian distribution, and we can normalize this observation to a z-score (standard normal variate) by <math>z_i = (y_i - \bar{y}_i)/\sigma_i</math>.  Note, then, that the sum of the squares of all the <math>z_i</math> is a <math>\chi^2</math> statistic.  The sum of the squares of <math>k</math> independent standard normal random variables itself forms a probability distribution, and this distribution has a name: [[w:Chi-square distribution|the <math>\chi^2</math> distribution with <math>k</math> degrees of freedom]].
+boopboopboop
+boopboopboop<math>\chi^2 = \sum_{i=0}^{N-1}\left(\frac{y_i - y(x_i|a_0 \dots a_{M-1})}{\sigma_i}\right)^2</math>
-Now flip this concept around: rather than knowing true, error-free <math>\bar{y}_i</math> values and talking about the probability distribution of the <math>y_i</math> values, what we know is the <math>y_i</math>s and we want to evaluate the probability that our model has estimated the <math>\bar{y}_i</math>s correctly.  The sum of the squares of the normalized differences between <math>y(x_i|a)</math> and <math>y_i</math> should follow the same <math>\chi^2</math> distribution as above, if the model estimated <math>\bar{y}_i</math> correctly.  For every degree of freedom in our model (<math>M</math>), though, we have to eliminate one degree of freedom (that is, consideration of one observation) from the <math>\chi^2</math> distribution, since theoretically we could've devoted that one degree of freedom to exactly reproducing one noisy observation that we made.
+boopboopboop
+boopboopboopThis is essentially a normalized error score, and in itself tells us nothing about goodness of fit.  We have some sense that a small <math>\chi^2</math> means a good fit and a big one means a bad fit, but how small and how big?  How good and how bad?  A <math>\chi^2</math> statistic must be interpreted in the context of the relative sizes of <math>N</math> and <math>M</math>.  Obviously, if you allow <math>M=N</math>, then you have as many degrees of freedom in your model as there are observations to test its fit, and so it's trivial to get <math>\chi^2 = 0</math> in this case.
-What we've arrived at is '''<math>\chi^2</math> model selection'''.  Given a value <math>\chi^2</math> as computed above, the probability <math>Q</math> of seeing an error as bad or worse than the observed <math>\chi^2</math> under a different set of observations drawn from the same underlying distribution, given the degrees of freedom <math>\nu \equiv N - M</math>, is:
+boopboopboop
+boopboopboopImagine that we knew the true, measurement-error-free value of every observation; let's call that <math>\bar{y}_i</math>.  Imagine also that our reckoning of the measurement error distributions (that is, that each one is Gaussian with mean <math>\bar{y}_i</math> and standard deviation <math>\sigma_i</math>) is exactly correct.  Then each noisy observation <math>y_i</math> is made from a Gaussian distribution, and we can normalize this observation to a z-score (standard normal variate) by <math>z_i = (y_i - \bar{y}_i)/\sigma_i</math>.  Note, then, that the sum of the squares of all the <math>z_i</math> is a <math>\chi^2</math> statistic.  The sum of the squares of <math>k</math> independent standard normal random variables itself forms a probability distribution, and this distribution has a name: [[w:Chi-square distribution|the <math>\chi^2</math> distribution with <math>k</math> degrees of freedom]].
-<math>Q(\chi^2 | \nu) = \int_{\chi^2}^{\infty}f(\psi,\nu)\,\mathrm{d}\psi = 1 - F(\chi^2 | \nu)</math>
+boopboopboop
+boopboopboopNow flip this concept around: rather than knowing true, error-free <math>\bar{y}_i</math> values and talking about the probability distribution of the <math>y_i</math> values, what we know is the <math>y_i</math>s and we want to evaluate the probability that our model has estimated the <math>\bar{y}_i</math>s correctly.  The sum of the squares of the normalized differences between <math>y(x_i|a)</math> and <math>y_i</math> should follow the same <math>\chi^2</math> distribution as above, if the model estimated <math>\bar{y}_i</math> correctly.  For every degree of freedom in our model (<math>M</math>), though, we have to eliminate one degree of freedom (that is, consideration of one observation) from the <math>\chi^2</math> distribution, since theoretically we could've devoted that one degree of freedom to exactly reproducing one noisy observation that we made.
-where <math>f(\psi | \nu)</math> is the [[w:Chi-square distribution#Probability density function|PDF of the <math>\chi^2</math> distribution]] and <math>F(\psi | \nu)</math> is of course [[w:Chi-square distribution#Cumulative distribution function|its CDF]].  Here's how to interpret <math>Q</math>:
+boopboopboop
-* Really low (<math>Q < 0.001</math>) means that you can't do a much worse job describing the data than you did with your model instance.  This means either that your fit is bad (<math>\chi^2</math> is really high) or that your model has so many degrees of freedom (and thus <math>\nu</math> is so small) that even a low <math>\chi^2</math> isn't convincing.
+boopboopboopWhat we've arrived at is '''<math>\chi^2</math> model selection'''.  Given a value <math>\chi^2</math> as computed above, the probability <math>Q</math> of seeing an error as bad or worse than the observed <math>\chi^2</math> under a different set of observations drawn from the same underlying distribution, given the degrees of freedom <math>\nu \equiv N - M</math>, is:
-* Really high (<math>Q \approx 1</math>) means that it's almost impossible to get a better fit to the observations, even given the number of degrees of freedom.  High values are actually sort of suspicious, and either indicate that the <math>\sigma_i</math>s were overestimated, <math>M</math> was underestimated, or the data were actually fudged to get a better result.
+boopboopboop
+boopboopboop<math>Q(\chi^2 | \nu) = \int_{\chi^2}^{\infty}f(\psi,\nu)\,\mathrm{d}\psi = 1 - F(\chi^2 | \nu)</math>
-This means that maximizing <math>Q</math> over all values of <math>M</math> (from very simple to very complex models) should find the "happy medium" model that maximizes explanatory power (low <math>\chi^2</math>) while minimizing model complexity (low <math>M</math>).
+boopboopboop
+boopboopboopwhere <math>f(\psi | \nu)</math> is the [[w:Chi-square distribution#Probability density function|PDF of the <math>\chi^2</math> distribution]] and <math>F(\psi | \nu)</math> is of course [[w:Chi-square distribution#Cumulative distribution function|its CDF]].  Here's how to interpret <math>Q</math>:
-Let's consider the meaning of all these variables for each of the problems defined above:
+boopboopboop* Really low (<math>Q < 0.001</math>) means that you can't do a much worse job describing the data than you did with your model instance.  This means either that your fit is bad (<math>\chi^2</math> is really high) or that your model has so many degrees of freedom (and thus <math>\nu</math> is so small) that even a low <math>\chi^2</math> isn't convincing.
+boopboopboop* Really high (<math>Q \approx 1</math>) means that it's almost impossible to get a better fit to the observations, even given the number of degrees of freedom.  High values are actually sort of suspicious, and either indicate that the <math>\sigma_i</math>s were overestimated, <math>M</math> was underestimated, or the data were actually fudged to get a better result.
-===Curve Clustering===
+boopboopboop
-* <math>N</math> is the number of vertices in all the curves in the tractography set.  Each <math>x_i</math> therefore represents the selection of a single vertex (by two parameters: the index of the curve in the curve set, and the index or arc-length distance of the desired vertex along this curve).  <math>y_i</math> isn't particularly well-defined in this case (see below), but amounts to the description of the curve near <math>x_i</math>: the vertex's position and the angle between the consecutive segments, maybe.
+boopboopboopThis means that maximizing <math>Q</math> over all values of <math>M</math> (from very simple to very complex models) should find the "happy medium" model that maximizes explanatory power (low <math>\chi^2</math>) while minimizing model complexity (low <math>M</math>).
-* <math>M</math> is the total number of curves that form the skeletons of the shrink-wrap polyhedra for the model, plus the total number of bundles/polyhedra.  Note that even in the most "complicated" model, <math>M</math> is smaller than <math>N</math> by a factor of half the average number of <math>x_i</math>s per curve.  <math>y(x_i|a_0 \dots a_{M-1})</math> is the description of the reconstructed curve specified by <math>x_i</math> at the specified position along it, given the association of curves with bundles specified by the model parameter values <math>a_0 \dots a_{M-1}</math>.
+boopboopboop
-* <math>\sigma_i</math> is defined according to the definition of the difference <math>(y_i - y(x_i|a_0 \dots a_{M-1}))</math> that we choose (see below).
+boopboopboopLet's consider the meaning of all these variables for each of the problems defined above:
+boopboopboop
-Let's assume that for comparing the reconstruction (<math>y(x_i|a_0 \dots a_{M-1})</math>, which we will just call <math>y</math>) with the observation (<math>y_i</math>), we want to incorporate both a position and angle reconstruction error.  Then we must define some sort of scalar pseudo-difference function
+boopboopboop===Curve Clustering===
+boopboopboop* <math>N</math> is the number of vertices in all the curves in the tractography set.  Each <math>x_i</math> therefore represents the selection of a single vertex (by two parameters: the index of the curve in the curve set, and the index or arc-length distance of the desired vertex along this curve).  <math>y_i</math> isn't particularly well-defined in this case (see below), but amounts to the description of the curve near <math>x_i</math>: the vertex's position and the angle between the consecutive segments, maybe.
-<math>f(y,y_i) = \alpha \times d_p(y,y_i) + (1-\alpha) \times d_{\theta}(y,y_i)\,</math>
+boopboopboop* <math>M</math> is the total number of curves that form the skeletons of the shrink-wrap polyhedra for the model, plus the total number of bundles/polyhedra.  Note that even in the most "complicated" model, <math>M</math> is smaller than <math>N</math> by a factor of half the average number of <math>x_i</math>s per curve.  <math>y(x_i|a_0 \dots a_{M-1})</math> is the description of the reconstructed curve specified by <math>x_i</math> at the specified position along it, given the association of curves with bundles specified by the model parameter values <math>a_0 \dots a_{M-1}</math>.
+boopboopboop* <math>\sigma_i</math> is defined according to the definition of the difference <math>(y_i - y(x_i|a_0 \dots a_{M-1}))</math> that we choose (see below).
-where <math>d_p</math> is the difference in positions (Euclidean distance) between the vertices, and <math>d_{\theta}</math> is some difference between angles, perhaps one minus the dot product.  <math>\alpha</math> may either be hand-tuned or solved for in the course of the optimization; in this latter case, <math>M</math> must be increased by one.
+boopboopboop
+boopboopboopLet's assume that for comparing the reconstruction (<math>y(x_i|a_0 \dots a_{M-1})</math>, which we will just call <math>y</math>) with the observation (<math>y_i</math>), we want to incorporate both a position and angle reconstruction error.  Then we must define some sort of scalar pseudo-difference function
-The <math>\sigma_i</math>s must be determined experimentally (with synthetic phantoms, for example).  It remains for these experiments to demonstrate that the errors are normally distributed.  If not, some wacky transform might be necessary to shoehorn it into the <math>\chi^2</math> regime, or Monte-Carlo simulation of the error distribution might allow for a different but equivalent approach (see &sect;15.6 of NR).
+boopboopboop
+boopboopboop<math>f(y,y_i) = \alpha \times d_p(y,y_i) + (1-\alpha) \times d_{\theta}(y,y_i)\,</math>
-===Bundle Adjustment===
+boopboopboop
+boopboopboopwhere <math>d_p</math> is the difference in positions (Euclidean distance) between the vertices, and <math>d_{\theta}</math> is some difference between angles, perhaps one minus the dot product.  <math>\alpha</math> may either be hand-tuned or solved for in the course of the optimization; in this latter case, <math>M</math> must be increased by one.
+boopboopboop
-===Microstructure Fitting===
+boopboopboopThe <math>\sigma_i</math>s must be determined experimentally (with synthetic phantoms, for example).  It remains for these experiments to demonstrate that the errors are normally distributed.  If not, some wacky transform might be necessary to shoehorn it into the <math>\chi^2</math> regime, or Monte-Carlo simulation of the error distribution might allow for a different but equivalent approach (see &sect;15.6 of NR).
-* <math>N</math> is the number of voxels in the relevant region multiplied by the number of observations (diffusion weightings) in each. Each individual configuration (voxel and diffusion weighting) is an <math>x_i</math>. The observed intensity for each <math>x_i</math> is <math>y_i</math>.
+boopboopboop
-* <math>M</math> is the number of control parameters in the space-filling microstructure model. Assuming that the microstructure instance for each voxel is determined by some sort of spline with sparse control points, <math>M</math> is 3 parameters for position plus the number of parameters actually controlling the microstructure model, multiplied by the number of control points.  <math>y(x_i|a_0 \dots a_{M-1})</math> is the reconstructed signal intensity in the voxel and diffusion weighting specified by <math>x_i</math>, given the model parameter values <math>a_0 \dots a_{M-1}</math>.
+boopboopboop===Bundle Adjustment===
-* <math>\sigma_i</math> is known for all <math>i</math> by estimation of the Rician noise parameters from the DWIs.  But...
+boopboopboop
+boopboopboop
-Unfortunately, we know that the noise on the <math>y_i</math>s is not Gaussian but rather Rician.  Therefore <math>(y_i-y(x_i|a))</math> is not normally-distributed, and so the <math>\chi^2</math> statistic is invalid in the form above.  We can, however, create a "fake" observation <math>\tilde{y}_i</math> from <math>y_i</math> for which <math>(\tilde{y}_i-y(x_i|a))</math> is normally distributed.  We do this by solving for the <math>\tilde{y}_i</math> that satisfies the following equality:
+boopboopboop===Microstructure Fitting===
+boopboopboop* <math>N</math> is the number of voxels in the relevant region multiplied by the number of observations (diffusion weightings) in each. Each individual configuration (voxel and diffusion weighting) is an <math>x_i</math>. The observed intensity for each <math>x_i</math> is <math>y_i</math>.
-<math>\int_{-\infty}^{y_i}\mathbb{P}_{\mathrm{rice}}(x|y)\,\mathrm{d}x = \int_{-\infty}^{\tilde{y}_i}\mathbb{P}_{\mathrm{gauss}}(x|y)\,\mathrm{d}x</math>
+boopboopboop* <math>M</math> is the number of control parameters in the space-filling microstructure model. Assuming that the microstructure instance for each voxel is determined by some sort of spline with sparse control points, <math>M</math> is 3 parameters for position plus the number of parameters actually controlling the microstructure model, multiplied by the number of control points.  <math>y(x_i|a_0 \dots a_{M-1})</math> is the reconstructed signal intensity in the voxel and diffusion weighting specified by <math>x_i</math>, given the model parameter values <math>a_0 \dots a_{M-1}</math>.
+boopboopboop* <math>\sigma_i</math> is known for all <math>i</math> by estimation of the Rician noise parameters from the DWIs.  But...
-The two sides of the equality are values of the cumulative distribution functions of, respectively, a Rician random variable and a Gaussian random variable:
+boopboopboop
+boopboopboopUnfortunately, we know that the noise on the <math>y_i</math>s is not Gaussian but rather Rician.  Therefore <math>(y_i-y(x_i|a))</math> is not normally-distributed, and so the <math>\chi^2</math> statistic is invalid in the form above.  We can, however, create a "fake" observation <math>\tilde{y}_i</math> from <math>y_i</math> for which <math>(\tilde{y}_i-y(x_i|a))</math> is normally distributed.  We do this by solving for the <math>\tilde{y}_i</math> that satisfies the following equality:
-<math>1-Q_1\!\left(\frac{y}{\sigma_i},\frac{y_i}{\sigma_i}\right) = \frac{1}{2}\left(1+\mathrm{erf}\!\left(\frac{\tilde{y}_i-y}{\sqrt{2\sigma_i^2}}\right)\right)</math>
+boopboopboop
+boopboopboop<math>\int_{-\infty}^{y_i}\mathbb{P}_{\mathrm{rice}}(x|y)\,\mathrm{d}x = \int_{-\infty}^{\tilde{y}_i}\mathbb{P}_{\mathrm{gauss}}(x|y)\,\mathrm{d}x</math>
-where <math>Q_1\,</math> is the first-order Marcum Q-function<ref group="note">References for the Marcum Q-function: [[w:Marcum Q-function|Wikipedia]], [http://mathworld.wolfram.com/MarcumQ-Function.html Mathworld], [http://www.mathworks.com/help/toolbox/signal/marcumq.html Matlab]</ref>, and <math>\mathrm{erf}\,</math> is the Gauss error function.  Solving, we find:
+boopboopboop
+boopboopboopThe two sides of the equality are values of the cumulative distribution functions of, respectively, a Rician random variable and a Gaussian random variable:
-<math>\tilde{y}_i = y + \sqrt{2\sigma_i^2}\;\mathrm{erf}^{-1}\!\left(1-2Q_1\!\left(\frac{y}{\sigma_i},\frac{y_i}{\sigma_i}\right)\right)</math>
+boopboopboop
+boopboopboop<math>1-Q_1\!\left(\frac{y}{\sigma_i},\frac{y_i}{\sigma_i}\right) = \frac{1}{2}\left(1+\mathrm{erf}\!\left(\frac{\tilde{y}_i-y}{\sqrt{2\sigma_i^2}}\right)\right)</math>
-This function re-maps a Rician random variable <math>y_i</math> into a Gaussian random variable <math>\tilde{y}_i</math> with the same <math>\sigma_i</math> and central value <math>y</math>, and so we may use this <math>\tilde{y}_i</math> in place of <math>y_i</math> in the summation for the <math>\chi^2</math> statistic.
+boopboopboop
+boopboopboopwhere <math>Q_1\,</math> is the first-order Marcum Q-function<ref group="note">References for the Marcum Q-function: [[w:Marcum Q-function|Wikipedia]], [http://mathworld.wolfram.com/MarcumQ-Function.html Mathworld], [http://www.mathworks.com/help/toolbox/signal/marcumq.html Matlab]</ref>, and <math>\mathrm{erf}\,</math> is the Gauss error function.  Solving, we find:
-The inner loop of our algorithm is given a model with <math>M</math> degrees of freedom, and fits the <math>a_j</math>s to maximize <math>Q</math> (or equivalently, minimize <math>\chi^2</math>) for that model (see &sect;15.5 of NR).  The outer loop is a search over all models (some of which may have the same <math>M</math>) to find a global maximum in <math>Q</math>.  As mentioned above, we should never get an unrealistically high <math>Q</math> value, as that's really only achievable by either fudging the data or overestimating the <math>\sigma_i</math>s.
+boopboopboop
+boopboopboop<math>\tilde{y}_i = y + \sqrt{2\sigma_i^2}\;\mathrm{erf}^{-1}\!\left(1-2Q_1\!\left(\frac{y}{\sigma_i},\frac{y_i}{\sigma_i}\right)\right)</math>
-= Notes =
+boopboopboop
-<references group="note" />
+boopboopboopThis function re-maps a Rician random variable <math>y_i</math> into a Gaussian random variable <math>\tilde{y}_i</math> with the same <math>\sigma_i</math> and central value <math>y</math>, and so we may use this <math>\tilde{y}_i</math> in place of <math>y_i</math> in the summation for the <math>\chi^2</math> statistic.
+boopboopboop
+boopboopboopThe inner loop of our algorithm is given a model with <math>M</math> degrees of freedom, and fits the <math>a_j</math>s to maximize <math>Q</math> (or equivalently, minimize <math>\chi^2</math>) for that model (see &sect;15.5 of NR).  The outer loop is a search over all models (some of which may have the same <math>M</math>) to find a global maximum in <math>Q</math>.  As mentioned above, we should never get an unrealistically high <math>Q</math> value, as that's really only achievable by either fudging the data or overestimating the <math>\sigma_i</math>s.
+boopboopboop
+boopboopboop= Notes =
+boopboopboop<references group="note" />

User:Jadrian Miles/Thesis manifesto: probabilistic worldview: Difference between revisions

Revision as of 17:46, 20 January 2012

Navigation menu

Search