Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sci-Hub downloads show countries where pirate paper site is most used (nature.com)
113 points by Brajeshwar on Feb 27, 2022 | hide | past | favorite | 42 comments


PhD student here. Aside from difficulty of accessing articles who don't have institutional access, it's significantly easier to obtain an article from sci-hub. Enter title or DOI - done. To get the same article using uni logins you need to search via database e.g. Medline, then click on the source button. This takes you to different resources you can download from. Once you click through, you can then download the pdf, unless they've decided ePub is the default option in which case you have change this. Rarely, I've had to sign up to attempt to download an article but find once I've done this, my institution doesn't actually have access. The steps are reduced if you're on campus but not much better to be honest. Absolute mess.


In articles downloaded per 1000 citizen (roughly):

    China:       17
    USA:         27
    France:      84
    Brazil:      13
    India:        1.5
    Indonesia:    5
    Germany:     17
    Singapore:  170
    Mexico:       8
    Iran:        12
    
That paints a very different picture from the absolute numbers.


I don't think per capita is a good metric either. A better ratio would divide by number of researchers, college students, number of published papers, etc


They're all good metrics, depending on what you're looking for.

I'm not sure what calculating number of papers downloaded vs. quantity of institutional artifacts of education in a particular country would get you, other than a guess that it's inversely related to the ease of access to papers by a country's institutions i.e. an inverse to subscription numbers.

Per capita is a bit messy, because it's tied up with internet access and wealth/educational inequality, but it also might be useful for that.


Eh, I think it's good enough. People who aren't researchers or college students don't even have the option of a tedious authentication system with a library.

I think it's more informative than total volume of articles, but that going further than that is the kind of overkill where you need twenty different numbers to capture all the different slightly different ways you can count the number of potential users. Should you do it versus number of universities? Number of students? Number of faculty? Number of graduates in the general population (who can't get papers otherwise but are qualified to read them)?

Per capita has criticisms about different structural features of countries, but it's pretty hard to improve on without being super careful about how you interpret things. Same with per capita, of course, but at least people kind of understand it. I am not convinced that people really do, few people seem to even know what a median is or why it can be more useful than a mean.


I'm not any kind of researcher or academic, but still like being able to read papers.


Plus china probably has chinese mirrors behind its firewall that are not being counted. From what I understand many people in china work as translators their main job is to translate scientific papers to chinese.


Regular user in China here. My experience is that libgen itself is not firewalled from China, only the IPFS mirrors.


Why normalize by population and not scholarly output? I suggest number of published academic articles, and even that isn't a proper metric, but it does paint a very different picture than yours.

https://www.worldatlas.com/articles/20-countries-publishing-...


> Why normalize by population and not scholarly output?

I did it out of personal bias: I use scihub for hobby and work but my scholarly output so far has been 0.

How you choose to normalize the table is probably a good litmus test for how you view science: if you think the goal of science is to generate insight you normalize by population, if you think it's to communicate in the scientific community you normalize by academic activity, if you think it's to advance society you normalize by GPD (or HDI or GPI).


I don't think the methodology is necessarily colored by my view of science. My assumption is that most of scientific paper downloads are done by people who are employed to do it. Granted, it's a flawed assumption and there are no metrics on "how many papers are downloaded by which demographics", so we don't really have enough information to make a strong conclusion, but the right answer is probably somewhere in between.


People I know who work in universities have access to lots of papers for free.

For us mere mortals who need to survive (which these papers definitely help), we need to use sci-hub, as the per article fees are huge.


Even then, there can be quite a lot of bureaucracy. At the beginning of my Master's degree (Brazil) I had automatic and frictionless access to papers through the University network, then they started requiring access through some weird government website. I just went to Sci-Hub, much easier.


for what it's worth, if you know the author's names, it's easy to find their email addresses online. Most authors would gladly send you a copy of their paper for free.


On sci-hub you can skim 20 papers in an hour to find the 3 that are relevant. That's weeks of work of emailing for preprints.


> the right answer is probably somewhere in between.

Almost certainly. All the views I listed are valid in their own right, and their proportion probably shifts depending on discipline. Anecdotally papers around health, well-being or fitness seem to reach an audience of people looking for self-improvement, meanwhile machine learning papers seem to be read by people on or for their job a lot. Some other disciplines meanwhile don't seem to reach a notable audience apart from themselves. Which is fine in it's own right.

I would actually love to see statistics on this, but I don't think anyone is in a position to provide them.


How would you even interpret the data if it was normalized by scholarly output or papers published?

I’m pretty sure the largest downloaders are people from industry rather than academia. I might open 50 papers in one day just to look at the graphs and then read just one. I’d guess the second largest are high school students. They might not read papers very often, but there are a whole lot more of them than academics.


Thank you for this, this is much more informative metric.

To the people saying that there is a better metric: you are more than welcome to calculate it and post it here.


Why would Nature give this stat in absolute numbers in the first place? Isn't it clear to writers of a scientific article that this way of reporting is uninteresting because it's then partially a stat of population size?


> Singapore: 170

that seems suspiciously high compared to the other countries. Maybe it's all the VPN users from china?


Also we need to keep in mind that normal (legal) traffic for typical universities is in the million downloads per year.


My opinion is solely based on my personal experience, so take it with a grain of salt.

I was enrolled in a PhD program (engineering) around the time Sci-Hub started, we were lucky enough to have access to most papers we needed thanks to the University agreements with Elsevier, IEEE, etc. I did not hear about Sci-Hub until a bit later, when I needed access to some academic papers but remote access to my university network was very cumbersome. I ended up downloading my OWN papers from Sci-Hub out of pure convenience. I have to say I always had mixed feelings (if not just hostility) towards the academic publishing industry, so I was actually happy my papers were available there.

During pandemic time I decided to enrol in a MBA in the fields of Humanities (because why not). My experience is that Humanity folks are not very aware of Sci-Hub, probably because they are not so tech-savvy in general, but those that discover it are more than happy to count with this extra resource.

Said this, my personal impression is that places like my country are not higher in the chart .. just because Sci-Hub is not better known yet.


> I have to say I always had mixed feelings (if not just hostility) towards the academic publishing industry, so I was actually happy my papers were available there.

You can pretty much always make your own papers available. Some journals are ok with you hosting copies on your personal website, and none of them can legally prevent you from distributing the accepted manuscript, either on your website, things like ResearchGate or repositories like Arxiv.


Now it is called "pirate" to access research, which was financed by the people?

I don't think we should attribute such action with any negatively connotated words. It is not like they are stealing anything. They are just accessing what should be accessible to anyone anyway.


There is a lot wrong with todays scientific publishing industry. But this does not mean that their service should be free.

Access to research is one thing, organising the review process, print layout, marketing another. Most researchers I know share their preprints for free. But the final peer reviewed & laid out product has added value.

The problem is the inflated price that researchers have allowed to be charged for their own product. Partly because many care more about their publication list than about anything else.

Rather than downloading illicit copies out of convenience, one should download preprints, engage in their discussion, seek out open peer review, share personal copies of peer reviewed papers, and contact other researchers for their papers.

We need to build a community of researchers if we want to take the greedy publishers out of the equation.


> Now it is called "pirate" to access research, which was financed by the people?

Yes it is. The fact there was tax payer money involved doesn't mean any contract with the private sector must be free. Up to the people who found the research to require that it should be published in other means. Grant money comes with many conditions and it's usually not one of them. And when it is, it is respected, from my experience.


It would be nice if Nature noted that there should not even be need for Sci-Hub as scientific literature should be open for everyone.


It would have been a good opportunity to highlight how you can publish articles in Nature as Open Access under a Creative-Commons license so people don't need to pirate them (as long as you have €9500 left over from your grant to cover the fee).


Actually our university has dedicated funds to cover open access so you do not necessarily need this in your primary grant. However, it is first come first serve and limited. What is worrying me is that there are so many predidatory open access journals out there and the open access advocates haven't understood the need to limit open access funds to journals that are actually read. And they should make sure that a relevant percentage of the money is actually spent on reviewing and/or editorial work. The money for an open access nature paper is probably well spent in the end. Otherwise publishing preprints is probably the better alternative.


How about not feeding predatory publishers money?

From my understanding:

- The actual research and writing work and most of the formatting is done by the researchers.

- The reviewing is done by other researchers donating their time, or rather, using work time paid with public money to work for free for these commercial entities.

- Most of the distribution is digital.

What actual work worth 9500 EUR (plus whatever they get from subscriptions, because until everyone goes open access there will still be subscriptions) do those publishers do nowadays?


> What actual work worth 9500 EUR (plus whatever they get from subscriptions, because until everyone goes open access there will still be subscriptions) do those publishers do nowadays?

Gatekeeping, degrading the quality of research and keeping the reproducability crisis going aren't cheap you know.


Anyone knows the infrastructure behind Sci-Hub? I'm asking because I feel like they're vulnerable to be taken down in the future.Maybe IPFS/other are not the best options but some sort of hardened network could benefit such project.


Arrgh, matie! that Sci-Hub clipper, she be no buccaneer vessel. That ship be loaded from bilge to mast with fathoms and fathoms of blimin' books and papers!

Take it from this old salt: Them landlubber publishers be tryin' to hornswaggle ya.


Isn't this not accurate at all due to mass VPN usage?


It is worth noting that some “nations with fewer scientific resources” provide their students and researchers broad access to scientific papers through their public universities and research institutions. That part of the article seems to be affirming the autor’s prejudice without a shred of confirmation


> And virtual private networks (VPNs), which are often used to circumvent bans in countries such as the United Kingdom, can skew the results by making it appear that users are in a different country.

How does this caveat not make the analysis useless, unless known VPN exits are filtered--which there was no indication of.


Non-researcher here, but does this imply that China is doing significantly more research than the other countries mentioned? I'd imagine it's either that, or the 'legal' article access systems aren't as accessible?


Based on the stat posted above (downloads per 1000 citizens), Singapore leads, which is interesting. Obviously their number of citizens isn't as high as most countries but that's still such a strong number, maybe there's some students who are randomly downloading all papers they're interested in?


It’s a tiny country with a very good university. I am not surprised at all.

> maybe there's some students who are randomly downloading all papers they're interested in?

You pretty much have to operate that way anyway. There is just too much stuff to do the old-fashioned “read one paper, get its references, and repeat”. It is much better to get many articles, and then sort them by degree of interest using things like abstracts, main figures, conclusions, etc, before reading them fully. It does increase the number of downloads significantly.


Or VPNs.


China is the source of a ton of research in the two fields I've worked in (materials science & machine learning)


Doesn’t really imply much without additional info, could just be due to a larger population.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: