All of the following models are the same as our main model m3, except for the noted changes to test robustness.

For the four historical populations, we imposed quite stringent exclusion criteria to ensure sufficient data quality for our intended analysis. This was not necessary for the modern Swedish data, because there were no exclusion criteria to relax.

```
model_filename = make_path("r1_relaxed_exclusion_criteria")
if (file.exists(model_filename)) {
cat(summarise_model())
r1 = model
}
```

Adding covariates increases the complexity of the model and makes it harder to interpret. We chose to adjust for many potential confounds because we are interested in causal isolation of the paternal age effect. Here we show what happens when only birth cohort and average paternal age in the family are adjusted for.

```
model_filename = make_path("r2_few_controls")
summarise_model()
r2 = model
```

We chose to control for birth order/number of older siblings as a categorical variable, lumping all those who had more than 5 in the category 5+. Because a continuous covariate is also plausible, we tested this alternative model as well.

```
model_filename = make_path("r3_birth_order_continuous")
summarise_model()
r3 = model
```

Birth order is usually used as a proxy variable for parental investment, the assumption being that older siblings require parental attention. However, there are are reasons to doubt this, as fully-grown siblings probably do not compete for the same resources. To compute a clearer proxy variable of competing siblings, we computed and adjusted for the number of siblings who were alive and younger than five at the time of birth of the anchor child.

```
model_filename = make_path("r4_control_dependent_sibs")
summarise_model()
r4 = model
```

Plausibly, being first-born has a different effect, when one is an only child as opposed to having two siblings, etc. Here, we allow for such an interaction effect.

```
model_filename = make_path("r5_birth_order_interact_siblings")
summarise_model()
r5 = model
```

Paternal age and birth order are highly collinear with each other and with maternal age. Therefore, the choice to include this predictor widens standard errors for each predictor and may be disputed. Here we show what happens when we simply omit the birth order control.

```
model_filename = make_path("r6_no_birth_order_control")
summarise_model()
r6 = model
```

We adjusted for parental loss very stringently, including covariates for parental loss up to age 45. Here we show what happens, when we only control for parental loss in the first, and the first five years of life.

```
model_filename = make_path("r7_less_parental_loss_control")
summarise_model()
r7 = model
```

Inheritance is linked to birth order and being male in several of the historical populations. Here, we adjust for the anchor being the first or last born adult son in a family. This implies that we control for our outcome to a certain extent, as “adult sons” cannot have died before adulthood, but a paternal age effect on mortality could still be detected for siblings other than the first- and last-born adults.

```
model_filename = make_path("r8_adjust_for_first_born_adult")
summarise_model()
r8 = model
```

In our main model, we control for birth cohort in 5-year-bins (lumping small bins). We chose to do so, because nonlinear and even sharply spiking effects of birth cohort are plausible (due to e.g. epidemics). This decision may be disputed, as it summarises 5-year-bins. Here, we instead allow for a thin-splate spline on the continuous birth year variable. This allows for smooth nonlinear (but not spiking) birth cohort effects.

```
model_filename = make_path("r9_continuous_byear_adjustment")
summarise_model()
r9 = model
```

Paternal age effects may vary between different families. Although we did not explore between-family moderators of paternal age effects in our study, we tested whether modelling an additional group-level slope for paternal age differences within the family, would change the results by allowing for shrinkage and to examine the amount of inter-family differences to be explained for potential future moderator analysis.

```
model_filename = make_path("r10_add_random_slope")
summarise_model()
r10 = model
```

Most anchors in our sample are full biological siblings and especially in the historical populations, divorce and remarriage was rare. Therefore, we chose to include only one group-level effect, for the parent couple (i.e. one group-level effect per father-mother-dyad). Including one intercept per parent is potentially a better way to adjust for genetic propensities inherited from either parent and allows estimating this propensity also from half-siblings, while half-sibling relationships were ignored in our main models. This comes at the cost of modelling complexity.

```
model_filename = make_path("r11_separate_random_effects_for_parents")
summarise_model()
r11 = model
```

It need not be the case that paternal age has the same effect on male and female children. For example, male children inherit only the small Y chromosome from the father, but female children inherit the larger X chromosome, so that paternal age predicts X-chromosomal de novo mutations in females but not in males (Francioli et al., 2016). At the same time, the autism literature suggests that males are less robust to heritable and de novo autism risk variants and that these effects are not simply due to having only one X chromosome (Werling & Geschwind, 2015). Here we let a dummy variable for being male moderate the paternal age effect.

```
model_filename = make_path("r12_sex_moderation")
summarise_model()
r12 = model
```

We already control for the average paternal age at which the children in a family were born. The mean is a more complete summary of the reproductive timing of the father than the age at first birth. However, far more literature has examined age at first birth and it has the advantage of never being censored (although we of course try to rule out censoring by choosing appropriate subsets). Therefore, we added age at first birth as a covariate in this model.

```
model_filename = make_path("r13_control_paternal_afb")
summarise_model()
r13 = model
```

Most of the previous literature has not used multilevel modelling, but linear group fixed effects (essentially dummy variables on the many thousands of families in the model). We believe our multilevel modelling approach has the advantage of allowing us to examine the effect of including predictors at the level of the family in the same model.

This allows us to

a) appropriately model a zero-inflated outcome such as number of children including those who died young (we’re not aware of a linear group fixed effect approach that handles hurdle or zero-inflated models)

b) examine group-level slopes for paternal age and potentially to examine moderators at the level of the family (though we did not do this)

c) explicitly model confounders at the level of the family (e.g. number of siblings).

Nevertheless, the prevalence of this approach in the literature mandates that we show how our approach compares. We fit this model using the R package “lfe” and the function felm. All covariates that were not estimable in principle were removed (i.e. number of siblings, paternalage.mean).

```
library(lfe)
r14 = readRDS(make_path("r14_compare_lfe"))
summary(r14)
```

In this model we attempted allow for regional variation in paternal age effects and attempted to better control residual variation. Our approach was two-fold: to moderate paternal age by region and to add a random effect for the church parish in which the individual was born. However, for the modern Swedish data, we had no geographic data and no regional information, so this model was not fit.

```
model_filename = make_path("r15_region_moderator_parish_ranef")
if(file.exists(model_filename)) {
cat(summarise_model())
r15 = model
}
```

Only in the DDB (historical Swedish data), parishes in some of the regions were still unlinked. This means that individuals could occur in more than one parish and not be linked. However, the region of Skellefteå was fully linked. Here, we test what happens when we restrict our dataset to Skellefteå.

```
model_filename = make_path("r16_restrict_to_skelleftea")
if(file.exists(model_filename)) {
cat(summarise_model())
r16 = model
}
```

- We assume that 4 in 1000 births are children with Down syndrome (four times the actual rate).
- We randomly excluded 33% of all children who had a mother older than 40 and had no children (many times the actual rate at that age).

```
model_filename = make_path("r17_simulate_downs")
summarise_model()
r17 = model
```

To make models computationally feasible and because early mortality was negligible, we fit the very large modern Swedish dataset with a `poisson()`

family distribution. All historical datasets had high early mortality, so we thought a `hurdle_poisson()`

was more appropriate. Here, we show what happens when we reverse this. The `hurdle_poisson()`

model can be fit to the modern Swedish data here, because we only use a subset.

```
model_filename = make_path("r18_hurdle_poisson")
summarise_model()
r18 = model
```

Previous analysts sometimes decided to use the normal distribution to predict (potentially zero-inflated) count data. Here, we refit our models using a normal distribution for the outcome. We show that estimates for the paternal age effect can be estimated to have a substantially different magnitude, because of this, but did not change direction.

```
model_filename = make_path("r19_normal_distribution")
summarise_model()
r19 = model
```

In this model, we test what happens when we do not adjust for maternal age, because it is highly collinear with paternal age.

```
model_filename = make_path("r20_no_maternalage_control")
summarise_model()
r20 = model
```

In this model, we adjust for maternal age using a continuous variable instead of three bins. This does not allow for nonlinear effects, but also does not aggregate the predictor. We cannot compare full siblings, test the effects of maternal and paternal age and adjust for average maternal and paternal age in the family (because the predictors are redundant), so that it is not perfectly possible to disentangle the contribution of maternal and paternal age and compare full siblings.

```
model_filename = make_path("r21_continuous_maternalage")
summarise_model()
r21 = model
```

Like *r1*, but we use a 30-years-later cutoff year for our birth cohorts, relaxing our censoring requirements.

```
model_filename = make_path("r22_relaxed_exclusion_censoring")
if(file.exists(model_filename)) {
cat(summarise_model())
r22 = model
}
```

To demonstrate the robustness of our prior choice we use Student’s t priors (fatter tails than normal priors) for our population-level effects and a half-Cauchy prior for our group-level effect for the family.

```
model_filename = make_path("r23_student_cauchy_priors")
summarise_model()
r23 = model
```

To demonstrate the robustness of our prior choice we use improper flat priors. These priors should make the model’s results comparable to a frequentist maximum likelihood approach.

```
model_filename = make_path("r24_uniform_priors")
summarise_model()
r24 = model
```

In the three historical populations, records were kept in the parish. Although records were linked between parishes in all populations, except three out of four provinces in historical Sweden, migration might sometimes lead to censoring of records. Adjusting for migration may however constitute a partial adjustment for the outcome, as lower offspring fitness might make them more likely to migrate. Hence, we show the results of doing so as a robustness analysis. In all analyses, we adjusted for a “migrated”-dummy variable. Migration was differently defined depending on the population. In Québec, we had flags denoting immigrants and emigrants. Few immigrants were included in our analyses anyway, as we needed parental information for our analyses. Emigrants were people who left Québec. In historical Sweden, migration was logged as migration from the parish of birth. In the Krummhörn, we set migrated to true, when the parish of death/burial differed from the parish of birth/baptism.

No migration information was available in 20th-century Sweden, but records there weren’t kept in parishes, so this should not pose a problem.

```
model_filename = make_path("r25_migration_status")
if(file.exists(model_filename)) {
cat(summarise_model())
r25 = model
}
```

Here we show the effect of paternal age for each episode.

In reference to *m3*, the main reported model, the robustness models were implemented as follows: *r1* relaxed exclusion criteria (not in 20th-century Sweden), *r2* had only birth cohort as a covariate, *r3* adjusted for birth order as a continuous variable, *r4* adjusted for number of dependent siblings instead of birth order, *r5* interacted birth order with number of siblings, *r6* did not adjust for birth order, *r7* adjusted only for parental loss in the first 5 years, *r8* adjusted for being the first-/last-born adult son, *r9* adjusted for a continuous nonlinear thin-splate spline for birth year instead of 5-year bins, *r10* added a group-level slope for paternal age, *r11* included separate group-level effects for each parent instead of one per marriage, *r12* added a moderation by anchor sex, *r13* adjusted for paternal age at first birth, *r14* compared a model with linear group fixed effects, *r15* added a moderator by region and group-level effects by church parish (not in 20th-century Sweden), *r16* was restricted to Skellefteå (only in historical Sweden), *r17* simulated Down syndrome cases, *r18* reversed hurdle Poisson and Poisson distribution for the respective populations, *r19* used a normal distribution, *r20* did not adjust for maternal age, *r21* adjusted for maternal age as a continuous variable, *r22* relaxed exclusion criteria and included 30 more years of birth cohorts, allowing for more potential censoring, *r23* used Student’s t distributions for population-level priors and half-Cauchy priors for the family variance component, *r24* used noninformative priors, which should lead to results comparable with maximum likelihood, *r25* controlled for migration status (not in 20th-century Sweden).

```
max_r = 25
rm(model)
m3 = readRDS(make_path("m3_children_linear"))
rob_checks = lstype("brmsfit")
robustness = data.frame()
for (i in seq_along(rob_checks)) {
chk = paternal_age_10y_effect(get(rob_checks[i]))[3,]
chk$model = rob_checks[i]
chk$robustness_analysis = str_match(rob_checks[i], "\\b([rm]\\d+)")[,2]
robustness = bind_rows(robustness, chk)
}
robustness = robustness %>% mutate(
median_estimate = as.numeric(median_estimate),
lower95 = as.numeric(str_match(ci_95, "\\[(-?[0-9.]+);")[,2]),
upper95 = as.numeric(str_match(ci_95, ";(-?[0-9.]+)]")[,2])
)
ggplot(robustness %>% mutate(robustness_analysis = factor(robustness_analysis,levels = c(paste0("r", max_r:1), "m3") ) ), aes(x = robustness_analysis, y = median_estimate, ymin = lower95, ymax = upper95)) +
geom_hline(yintercept = 0, linetype = 'dashed') +
geom_pointrange() +
geom_text(aes(label = robustness_analysis, group = effect), vjust = -0.8) +
xlab("Robustness analysis") +
ylab("Percentage change in outcome by paternal age") +
theme(axis.ticks.y = element_blank(), axis.text.y = element_blank()) +
coord_flip()
saveRDS(robustness, file = make_path("robustness"))
```