Free online index of scientific knowledge


More than 107 million scientific articles have just been cataloged for public use thanks to a new project called The General Index.

Typically, academic studies exist behind a pay wall – enclosing potentially important information not only from the public but, perhaps more importantly, from other scientists.

The General Index wants to release this information. The index acts almost like a Google search for scientific articles, but with a twist. Only extracts from the articles are provided, so it is up to the users to mine the data and make sense of it all.

“Science is a language we all need to speak if we are to improve our world. “

Carl Malamud

The 38 terabyte database contains a collection of text and sample snippets, one to five words, from 107 million journal articles.

Why is this important: While the general public does not often read academic articles, scientists do. They want to read the work of other researchers because building on previous scientific research can help them in their own work.

“There is no way for me – or anyone else – to analyze or experimentally measure the chemical footprint of every plant species on Earth,” said Gitanjali Yadav, a university biologist. from Cambridge, who did not work on The General Index, said Nature. “Most of the information we are looking for already exists, in the published literature. “

Carl Malamud, archivist and creator of The General Index, says that is what he hopes this project will be used for.

“It is a research tool, a dictionary of knowledge, a map of knowledge, a tool that we believe to be a central tool for the practice of science in our modern time”, a Malamud said in a statement. video about the index. “We see this as a public service. We do not assert any property on the general index. It is dedicated to the public domain. A series of uncluttered facts that you can do whatever you want with. There are no rights reserved.

“We see this as a public service. We do not assert any property on the general index. It is dedicated to the public domain. A series of uncluttered facts that you can do whatever you want with.

Carl Malamud

What it is: We already have Google Scholar, a search engine that scours the scientific literature to find the most relevant match to a search term. But the general index is not a search engine. Rather, it is a collection of carefully cataloged and organized scientific literature.

In total, more than 355 billion phrases and words, listed next to their corresponding articles, appear in the index, reports Nature.

How it works: The purpose of the index is to help with text mining – discovering new information by scanning a ton of text, looking for patterns and trends. Humans have become quite good at analyzing headlines, tweets, and short pieces of text. But we couldn’t scan millions of articles, jot down critical information, cross-reference, and connect the dots.

The database can help us do the job.

And, because the database only uses brief extracts from each scientific article and not the article itself, it is free to use and download with no copyright restrictions.

Extracting any useful information from tiny snippets, or n-grams (short sequences of words from each paper) is difficult for the average human. We can’t just read the article in its entirety, understanding each sentence in the context of the article as a whole. So, to make sense of it all, scientists will have to use software and maybe write their code to extract data, recognize patterns, and use statistics or machine learning to glean useful information.

Data can be downloaded directly from, a time consuming method. But the folks on the / r / DataHoarder subreddit upload it to a remote server and distribute it in BitTorrent, reports Vice.

“Science is a language we all need to speak if we are to improve our world,” said Malamud.

It’s cool, but is it legal? The general index is published in the wake of ongoing legal battles between publishers and Sci-Hub. The controversial portal is a pirate website that provides free access to millions of scientific research articles that are otherwise copyrighted. Several publishers have filed a lawsuit against Sci-Hub, alleging copyright infringement.

Without endorsing Sci-Hub, many scientists argue that publicly funded research should be freely available to the public. This would allow scientific knowledge to continue to advance. Alexandra Elbakyan, the founder of Sci-Hub, was even crowned by Nature as one of ten people in science that matter most

But Malamud, the creator of The General Index, said Nature that his database is different – he says it’s 100% legal because he doesn’t publish the full document, only snippets.

“I am very confident that what I am doing is legal. We are not doing this to provoke a lawsuit, we are doing it to advance science, ”he said.

We would love to hear from you! If you have a comment on this article or have a tip for a future Freethink story, please email us at [email protected].

Source link


About Author

Comments are closed.