Statistical Rethinking: Chapter 4 Practice Answers

We use some of the code from the pymc-devs Python/pymc3 port of the Statistical Rethinking code examples.

4E1

Line 1: $y_i \sim \text{Normal}(\mu,\sigma)$

4E2

Two parameters are in the posterior: $\mu$ and $\sigma$.

4E3

We can express our posterior joint distribution over $\mu$ and $\sigma$ as:

$$P(\mu,\sigma|\mathbf{y}) = \frac{\prod_i\text{Normal}(y_i|\mu,\sigma)\text{Normal}(\mu|0,10)\text{Exponential}(\sigma|1)}{\int\int\prod_i\text{Normal}(y_i|\mu,\sigma)\text{Normal}(\mu|0,10)\text{Exponential}(\sigma|1)d\mu d\sigma}$$

4E4

Line 2: $\mu_i = \alpha + \beta x_i$

4E5

Three parameters are in the posterior: $\alpha$, $\beta$, and $\sigma$.

4M1

4M2

The point of my Python/pymc3 solutions is to learn pymc3, so instead of using quadratic approximation, which I'll likely never do in practice, we'll use the Markov Chain Monte Carlo sampling approach of pymc3.

4M3

$$y_i \sim \text{Normal}(\mu, \sigma)$$$$\mu = \alpha + \beta x$$$$\alpha \sim \text{Normal}(0, 10)$$$$\beta \sim \text{Uniform}(0, 1)$$$$\sigma \sim \text{Exponential}(1)$$

4M4

Average age of a growing student is probably somewhere around 12 years old (You can be a student at age 5 onwards). Average heights online are around 150cm for people of this age. We give it a large standard deviation of 20cm, so if we're given sufficient evidence that mean can really be any realistic height.

$$h_i \sim \text{Normal}(\mu, \sigma)$$$$\mu = \alpha + \beta (t_i - \bar{t})$$$$\alpha \sim \text{Normal}(150, 20)$$$$\beta \sim \text{Uniform}(0, 10)$$$$\sigma \sim \text{Exponential}(1)$$

4M5

In one sense I've already accounted for this by setting my mean for $\alpha$ to be that height of a child, rather than that of an adult.

However, we want the slopes to be positive. Let's check our prior predictive distribution to be sure.

Yes, all those slops look non-negative.

4M6

In this case we'd want to bound $\sigma$ to be no more than 8 (since variance of 64 means a standard deviation of 8). The question is what the best way to bound this variable is.

Honestly, the probability mass is so low at sigma=8 that I wouldn't worry setting a hard boundary:

4M7

The posterior predictive distributions are the same, but the parameterization is different.

The $\alpha$ variable no longer is the mean, since $\beta$ is not multiplied by 0 when the input weight is the mean. Instead, $\alpha$ is lower.

Seems to induce some minor covariance between $\alpha$ and $\beta$.

4M8

Having more knots gives the spline more room to try and fit the data more closely and possibly overfit. Making the prior more narrow counteracts this.

Hard

4H1

We need to predict the expected heights given some weights and provide an 89% interval. Let's start with expected heights. For this we'll just take the mean of the posterior values of alpha and beta and transform the weight.

As for the intervals. What does the 89% interval mean? It means take the ranges of heights that form the middle 89% of the probability mass, and show the bounds.

One way of doing this would be to take pairs of alpha and beta samples and compute height from the weight. Then we will have a distribution over heights and we can calculate the interval.

What I'm not sure about here is... this is probably the 89% interval for the parameters of the line, not for the actual variation of the heights could be far larger.

That's right. Above we're not factoring in the height variance, only the variation in the parameter estimates. So we want to sample from the posterior but constrain our weight input variable. We can do this by taking samples from our posterior and computing samples from the distribution over mu by using the weights we're interested in, before using those samples to in turn sample heights.

4H2

Going to do this with MCMC instead of quadratic approximation.

(a) For every 10 units increase in weight, this is the change in height:

It's a surprisingly manual process to change some of the input variables. It would be nice to have a model where (1) the model definition wasn't coupled with observed data and (2) the input variables could easily be changed.

(c) The model fit is concerning because there isn't a linear relationship between weight and height. Since this non-linearity only popped up once we filtered for people who are under 18, it would make sense that it has to do with growth. Younger children are lighter, and an increase in weight has a stronger positive association with increases in height. But older people who weigh more have larger variability in weight without a commensurate change in height.

One assumption we could change is to stop assuming a linear relationship between weight and height. A logarithmic relationship may fit better. Implicitly the model assumes that the same slope parameter is useful for all ages. Stratifying by age buckets could be another approach, but then you induce sparsity of data points.

4H3

Good colleague! Looks pretty solid. Above also does (b) although the question did technically want the 97% interval not the 97% HDI.

4H4

It's not easy to come up with a prior that fits reasonable ranges very well. I guess the best strategy is to just have vague priors so you don't accidentally force a bad prior down the model's throat.

I'm not so interested in splines, so I'll depart from the chapter's exercises here.