Looking at the RPP to bring data to a discussion

After his talk at the Center for Adaptive Rationality, Stephan Lewandowsky and I had a small discussion whether scientists can actually pick “winners”. The discussion stemmed from a larger discussion about whether we get more research waste, if we replicate first, then publish, or publish first, and then replicate those studies that are found interesting.

If I recall correctly, we didn’t really disagree that scientists *can* tell if things are off about a study, but we did disagree on whether *citation* indexes such a quality assessment, and is a useful way to find out which studies are worthy of more attention.

So, I ran the numbers for one of the few studies where we can find out, the Reproducibility Project: Psychology. I tweeted it back then, but felt like making the graphs nicer and playing with radix on a train ride.

We found 167 DOIs, so we had DOIs for all our 167 studies^{1}.

No, not for the citation count recorded in the RPP.

```
Call:
glm(formula = citations_2015 ~ replicated_p_lt_05, family = quasipoisson(),
data = .)
Deviance Residuals:
Min 1Q Median 3Q Max
-11.434 -6.244 -3.233 3.998 20.520
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.47092 0.10351 43.194 <2e-16 ***
replicated_p_lt_05yes -0.08533 0.17903 -0.477 0.635
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 59.95478)
Null deviance: 5247.2 on 98 degrees of freedom
Residual deviance: 5233.5 on 97 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
```

I used the Crossref API to get DOIs and current citation counts for the papers contained in the RPP. Again, there was no association with replication status.

```
Call:
glm(formula = citations_2018 ~ replicated_p_lt_05, family = quasipoisson(),
data = .)
Deviance Residuals:
Min 1Q Median 3Q Max
-10.645 -6.532 -3.799 3.645 18.270
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.23501 0.11474 36.911 <2e-16 ***
replicated_p_lt_05yes -0.08961 0.19873 -0.451 0.653
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 58.18606)
Null deviance: 5029.8 on 98 degrees of freedom
Residual deviance: 5017.8 on 97 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
```

This is pretty dirty work, because I’m subtracting citation counts from one source with another, so most papers are cited less in 2018 than in 2015. But haven’t found a quick way to get citation counts in 2015 from `rcrossref`

. I’ve requested the necessary access to Scopus, where I could check, but Elsevier is being annoying.

Again, no association. So, assuming the dirtiness of the analysis doesn’t matter, the literature hasn’t reacted at all to the presumably important bit of information that a study doesn’t replicate.

```
Call:
glm(formula = citations_after_2018 ~ replicated_p_lt_05, family = quasipoisson(),
data = .)
Deviance Residuals:
Min 1Q Median 3Q Max
-9.7574 -1.1986 0.3785 1.0692 6.5136
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.862898 0.041576 92.912 <2e-16 ***
replicated_p_lt_05yes 0.007475 0.069755 0.107 0.915
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 5.266201)
Null deviance: 595.03 on 98 degrees of freedom
Residual deviance: 594.97 on 97 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
```

A slightly different way of looking at it does not yield different conclusions for me.

Hard to tell with this little data!

```
Call:
glm(formula = citations_2015 ~ Journal * replicated_p_lt_05,
family = quasipoisson(), data = .)
Deviance Residuals:
Min 1Q Median 3Q Max
-11.928 -5.955 -1.643 3.016 19.285
Coefficients:
Estimate Std. Error t value
(Intercept) 3.80221 0.27601 13.775
JournalJPSP 0.84525 0.30954 2.731
JournalPS 0.76733 0.31351 2.448
replicated_p_lt_05yes -0.03744 0.40920 -0.091
JournalJPSP:replicated_p_lt_05yes -0.05791 0.52635 -0.110
JournalPS:replicated_p_lt_05yes 0.12185 0.46906 0.260
Pr(>|t|)
(Intercept) < 2e-16 ***
JournalJPSP 0.00756 **
JournalPS 0.01626 *
replicated_p_lt_05yes 0.92730
JournalJPSP:replicated_p_lt_05yes 0.91264
JournalPS:replicated_p_lt_05yes 0.79561
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 51.19593)
Null deviance: 5247.2 on 98 degrees of freedom
Residual deviance: 4359.9 on 93 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
```

So, are citation counts a poor indicator of quality? The most common reaction I received to these results was saying that the 7 years from the publication of the studies to 2015 are probably not enough for citation counts to become more signal than noise, or at least that the 3 years from the publication of the RPP results to 2018 are not enough. These reactions mostly came from people who did not really believe in citations-as-merit before anyway.

To me, if 10 years after publication citations cannot be used to distinguish between studies that replicated and those that didn’t, they’re probably not a useful measure of thoroughness that can be used in assessment, hiring, and so on. They may be a useful measure of other important skills for a scientist, such as communicating their work; they may measure qualities we don’t want in scientists, but it seems they are not useful to select people whose work will replicate. I think that is something we should want to do.

In addition, the literature does not react quickly to the fact that studies do not replicate. Given that people also keep citing retracted studies (albeit with a sharp drop), this does not surprise me. It will be interesting to revisit the data in a few years time and see if researchers picked up on replication status then.

These were all studies from reputable journals, so we might have some range restriction here. On the other hand, plenty of these studies don’t replicate, and citation counts go from 0 to >300.

Hover your mouse over the dots to see the study titles.

These analyses are based on Chris J. Hartgerink’s script. The data and his script can be found on the OSF. Did I get the right DOIs? There are probably still some mismatches. Titles are not exactly equal for 84 studies, but on manual inspection this is only because Crossref separates out the subtitle, and 150 of 167 titles start exactly the same.

Were they they all correct? See Appendix↩

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rubenarslan/rubenarslan.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Arslan (2018, Sept. 23). Ruben Arslan: Are studies that replicate cited more?. Retrieved from https://rubenarslan.github.io/posts/2018-09-23-are-studies-that-replicate-cited-more/

BibTeX citation

@misc{arslan2018are, author = {Arslan, Ruben C.}, title = {Ruben Arslan: Are studies that replicate cited more?}, url = {https://rubenarslan.github.io/posts/2018-09-23-are-studies-that-replicate-cited-more/}, year = {2018} }