I don't think per capita is a good metric either. A better ratio would divide by number of researchers, college students, number of published papers, etc
They're all good metrics, depending on what you're looking for.
I'm not sure what calculating number of papers downloaded vs. quantity of institutional artifacts of education in a particular country would get you, other than a guess that it's inversely related to the ease of access to papers by a country's institutions i.e. an inverse to subscription numbers.
Per capita is a bit messy, because it's tied up with internet access and wealth/educational inequality, but it also might be useful for that.
Eh, I think it's good enough. People who aren't researchers or college students don't even have the option of a tedious authentication system with a library.
I think it's more informative than total volume of articles, but that going further than that is the kind of overkill where you need twenty different numbers to capture all the different slightly different ways you can count the number of potential users. Should you do it versus number of universities? Number of students? Number of faculty? Number of graduates in the general population (who can't get papers otherwise but are qualified to read them)?
Per capita has criticisms about different structural features of countries, but it's pretty hard to improve on without being super careful about how you interpret things. Same with per capita, of course, but at least people kind of understand it. I am not convinced that people really do, few people seem to even know what a median is or why it can be more useful than a mean.
Plus china probably has chinese mirrors behind its firewall that are not being counted. From what I understand many people in china work as translators their main job is to translate scientific papers to chinese.
Why normalize by population and not scholarly output? I suggest number of published academic articles, and even that isn't a proper metric, but it does paint a very different picture than yours.
> Why normalize by population and not scholarly output?
I did it out of personal bias: I use scihub for hobby and work but my scholarly output so far has been 0.
How you choose to normalize the table is probably a good litmus test for how you view science: if you think the goal of science is to generate insight you normalize by population, if you think it's to communicate in the scientific community you normalize by academic activity, if you think it's to advance society you normalize by GPD (or HDI or GPI).
I don't think the methodology is necessarily colored by my view of science. My assumption is that most of scientific paper downloads are done by people who are employed to do it. Granted, it's a flawed assumption and there are no metrics on "how many papers are downloaded by which demographics", so we don't really have enough information to make a strong conclusion, but the right answer is probably somewhere in between.
Even then, there can be quite a lot of bureaucracy. At the beginning of my Master's degree (Brazil) I had automatic and frictionless access to papers through the University network, then they started requiring access through some weird government website. I just went to Sci-Hub, much easier.
for what it's worth, if you know the author's names, it's easy to find their email addresses online. Most authors would gladly send you a copy of their paper for free.
> the right answer is probably somewhere in between.
Almost certainly. All the views I listed are valid in their own right, and their proportion probably shifts depending on discipline. Anecdotally papers around health, well-being or fitness seem to reach an audience of people looking for self-improvement, meanwhile machine learning papers seem to be read by people on or for their job a lot. Some other disciplines meanwhile don't seem to reach a notable audience apart from themselves. Which is fine in it's own right.
I would actually love to see statistics on this, but I don't think anyone is in a position to provide them.
How would you even interpret the data if it was normalized by scholarly output or papers published?
I’m pretty sure the largest downloaders are people from industry rather than academia. I might open 50 papers in one day just to look at the graphs and then read just one. I’d guess the second largest are high school students. They might not read papers very often, but there are a whole lot more of them than academics.
Why would Nature give this stat in absolute numbers in the first place? Isn't it clear to writers of a scientific article that this way of reporting is uninteresting because it's then partially a stat of population size?