This working paper shows an R implementation of Bayes’s formula as used in the blog post What’s the True Effect Size? It Depends What You Think to estimate true effect size in two group difference-of-mean studies. The main point is that the implementation is really, really simple. With all the fuss and bother about Bayesian methods, I imagined it would be incredibly hard to code. Happily not.

Bayes’s Formula

The line of math below is Bayes’s formula for the example at hand. An important nuance is that the “probabilities” are probability densities - think R’s dnorm.

$P(d_{pop} \mid d_{obs}) = \frac {P(d_{pop}) \times P(d_{obs} \mid d_{pop}) } {P(d_{obs})}$

1. The term on the left hand side is what we’re trying to compute, namely, the posterior probability distribution of $$d_{pop}$$ for a given value of $$d_{obs}$$ ($$0.5$$ in the blog post’s running example).

2. The first term on the right hand side, $$P(d_{pop})$$, is the prior. For the normal prior in Figures 1 and 2 of the post, this is R’s dnorm with $$mean=0.3$$ and $$sd=0.1$$.

3. The next term, $$P(d_{obs} \mid d_{pop})$$, is the probability distribution of $$d_{obs}$$ for a given $$d_{pop}$$. For the two group difference-of-mean studies in the post, this is a noncentral t-distribution centered on $$d_{obs}$$.

4. The denominator, $$P(d_{obs})$$, is the probability of a given value of $$d_{obs}$$ across all values of $$d_{pop}$$. This is the integral from $$-\infty$$ to $$\infty$$ of the numerator in Bayes’s formula.

R Implementation

Below is R code to compute the posterior for the examples in this post. The code works: just copy-and-paste into an R session and it’ll plot two graphs similar to Figures 1 and 2 in the blog post. The code is also available in the file R/baysx.R in my GitHub repo.

## Return function for posterior probability of d.pop given d.obs for two group
## difference-of-mean studies with equal sample size and unit standard deviation
##   n is sample size per group
##   d.obs is standardized observed effect size
##   prior is function of d.pop, eg, function(d.pop) dnorm(d.pop,mean=0.3,sd=0.1)
##
posterior=function(n,d.obs,prior) {
## probability of d.obs given d.pop for examples at hand. dd2t defined below
P_obsGIVENpop=function(d.pop) dd2t(n,d.pop,d.obs);
## numerator in Bayes formula
numerator=function(d.pop) prior(d.pop)*P_obsGIVENpop(d.pop);
## denominator in Bayes formula
P_obs=integrate(function(d.pop) numerator(d.pop),-Inf,Inf)\$value;
P_popGIVENobs=function(d.pop) numerator(d.pop)/P_obs;
}

## probability density of noncentral t in terms of n, d.pop, d.obs
dd2t=function(n,d.pop,d.obs) {
df=2*(n-1);
t=d.pop*sqrt(n/2);
sqrt(n/2)*suppressWarnings(dt(t,df=df,ncp=sqrt(n/2)*d.obs));
}

## examples with normal prior as in Figures 1 and 2 of the post
##   each graph plots the posterior for the given value of n,
##   as well as the prior (dnorm(d.pop,mean=0.3,sd=0.1)) for reference
## n=10
dev.new();
dpost=posterior(n=10,d.obs=0.5,prior=function(d.pop) dnorm(d.pop,mean=0.3,sd=0.1));
curve(dpost,from=-0.2,to=0.8,col='red',lwd=2,xlab='d',ylab='probability density');
grid();
## n=200;
dev.new();
dpost=posterior(n=200,d.obs=0.5,prior=function(d.pop) dnorm(d.pop,mean=0.3,sd=0.1));
curve(dpost,from=-0.2,to=0.8,col='red',lwd=2,xlab='d',ylab='probability density');
grid();

The resulting graphs are

The code exploits an advanced R feature that may be unfamiliar to some readers, namely, the ability to treat functions like other kinds of data. You can store a function in a variable, pass a function as an argument to another function, create a new function on-the-fly and store it in a variable, call the newly created function via the variable, and return a function as the result of a function.

I use this feature repeatedly.

1. I pass the prior probability function (dnorm(d.pop,mean=0.3,sd=0.1)) to posterior as the argument prior.

2. Each line of posterior creates a function: e.g., the line P_obsGIVENpop=function(d.pop) dd2t(n,d.pop,d.obs); creates a function of d.pop that calls the function dd2t on variables n, d.pop, and d.obs. A further detail is that variables n and d.obs are bound to the values passed into posterior as arguments, while d.pop is free and gets its value later when the new function P_obsGIVENpop is called.

3. The final result is likewise a function of d.pop: function(d.pop) numerator(d.pop)/P_obs;

4. The examples store the function in the variable dpost, then use curve (a base R function) to call dpost and plot the result.