Popularity of arXiv.org within Computer Science

It may seem surprising that, out of all areas of science, computer scientists have been slow to post electronic versions of papers on sites like arXiv.org, but so it has. We've tended to place our papers on our home pages, but this loses the benefits of aggregation, namely notification and browsing. There's a reason people use blogs and social media rather than individual web hosting providers.

But this is changing. More and more computer scientists are now using the arXiv. At the same time, there is ongoing discussion and controversy about how prepublication affects peer review, especially for double-blind conferences. This discussion is often carried out with precious little evidence of how popular prepublication is.

We measure what percentage of papers in computer science are placed on the arXiv, by cross-referencing published papers in DBLP with e-prints on arXiv. We found:

Usage of arXiv.org has risen dramatically among the most selective conferences in computer science. In 2017, fully 23% of papers had e-prints on arXiv, compared to only 1% ten years ago.
Areas of computer science vary widely in e-print prevalence. In theoretical computer science and machine learning, over 60% of published papers are on arXiv, while other areas are essentially zero. In most areas, arXiv usage is rising.
Many researchers use arXiv for posting preprints. Of the 2017 published papers with arXiv e-prints, 56% were preprints that were posted before or during peer review.

Our paper describes these results as well as policy implications for researchers and practitioners.