Mixed models: model syntax

keywords power analysis, mixed models, multiple clusters, participants by stimuli

0.8.2

Here we discuss some rules in defining mixed models for mixed models power analysis in PAMLj.

Model Syntax

Power analysis for mixed model requires to input a mixed model with all the expected coefficients. PAMLj employs a custom syntax based on R package lme4 (Bates et al. 2015) standard formulas, modified to easily pass coefficients.

Model Terms

First, a model in the R package lme4 (Bates et al. 2015) is to be defined. For instance, a simple random intercept-only model may look like this

y~1+x+(1|clustervar)

Recall that 1 indicates the intercept: this model has a fixed intercept, a fixed effect of x and a random intercept across a variable named clustervar. The definition of terms is pretty much like the R syntax, woth a few restrictions:

Interactions are always defined with the column :, never with the star.
Intercepts should always be defined with 1 or 0. For fixed effect, the intercept may be omitted but a warning is issued. Better explicitly declare it.

Model Coefficients

Second, one needs to input the coefficients of each term in the model using the syntax value*x. Thus, the syntax:

y~1*1+.5*x+(1*1|clustervar)

This means that we expect the fixed intercept to be \(1\), the fix coefficient associated with \(x\) to be equal to \(.5\) and we expect the variance of the random intercept to be \(1\). There are a few rules to follows:

Each term should have a coefficient.
Coefficients for categorical variables are passed with the syntax [1,2,3]*x, where x has 4 levels, so it requires 3 contrast variables to estimate the effect.

For instance, if x is categorical with 3 levels, the syntax would be:

y~1*1+[.5,1,1.5]*x+(1*1|clustervar)

If a model has also a random coefficient of the independent variable (random slopes), it should be added to the model with its expected coefficient. For instance:

y~1*1+.5*x+(1*1+1.5*x|clustervar)

indicates that x has a random slopes whose variance is \(1.5\) across levels of clustervar. If the random slopes regards a categorical variable, the variance coefficients corresponding to the contrasts variable representing the variable should be cast with the brackets syntax:

y~1*1+[.5,1,1.5]*x+(1*1+[1.1,2,3.1]*x|clustervar)

indicating a random coefficient for the three contrasts variables representing the effect of x, with variances \(1.1\), \(2\), and \(3.1\) respectively.

y~1*1-.5*x+(1*1+1.5*x|clustervar)

For categorical variable, indicate the sign of the coefficient within the brackets:

y~1*1+[.5,-1,-1.5]*x+(1*1|clustervar)

Model Coefficients sign

Coefficients can be positive or negative. For continuous independent variables (terms) simply use the the negative sign in the formula, like in:

y~1*1-.5**x+(1*1|clustervar)

Multiple clustering variables

Multiple clustering variables can specified, for instance:

y~1*1+.5*x+(1*1+1.5*x|class)+(1*1|school)

Symbolic coefficients

PAMLj allow passing also symbolic coefficients in the formula. Symbolic coefficients are labels that can be used to refer to the term/coefficient in additional syntax lines. Here an example:

y~1*1+a*.5*x+b*2*z+(1*1|school)

In this example, a and b would refer to the coefficient of x and z respectively.

Additional directives

The field Module Structure accepts additional directive over the specified model. At the moment there is only one directive that is recognized.

Testing a specific effect

By default, PAMLj computes the power parameters for the smallest fixed effects (the coefficients with the smallest associated power). If one wishes to test a specific coefficient, which is not necessarily the smallest, the following syntax can be used.

test: alabel

This directive implies that the estimated sample size, Required Number of cluster levels or Required N per cluster are solved for the effect whose label is alabel. Simply use a symbolic coefficient for the target term to be tested and use test: directive. In practice:

y~1*1+a*.5*x+b*2*z+(1*1|school)
test:b

computes the required sample size for 2*z, irrespective of the fact that .5*x has a smaller power.

Variables levels

Variables can be defined as varying either within or between. By defaults, all categorical variables are assumed to be within cluster, or within combinations of clusters levels. Continuous variables are draw from a random normal distribution across the whole sample (observation) so they have variability both within and between clusters.

If one want to specify a variable to be within cluster, meaning to vary only within a cluster, one can use the keyword witthin: var|cluster (can be abbreviated to wit: var|cluster)

y~1*1+.5*x+2*z+(1*1|school)
within: x|school

this indicates that the variable x varies within each school. If x is categorical, all levels of x are repeated within each level of the clustering variable. If x is continuous, x is standardized with each level of the clustering variable.

If one want to specify a variable to be between clusters, meaning to vary only across clusters, one can use the keyword between: var|cluster (can be abbreviated to bet: var|cluster)

y~1*1+.5*x+2*z+(1*1|school)
between: x|school

this indicates that the variable x is constant within each school and varies across schools. If x is categorical, each cluster level has only one level of x. If x is continuous, x is constant within each school and is standardized across levels of the clustering variable.

Additional material

Details

Some more information about the module specs can be found here

Examples

Some worked out practical examples can be found here

Return to main help pages

Main page Mixed models

Comments?

Got comments, issues or spotted a bug? Please open an issue on PAMLj at github or send me an email

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.