Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

This is a wonderful idea, so-called "legalities" be damned. Leave it to the lawyers to come in and mess up a good thing. Monetizing information and data gained from basic scientific research is wrong, IMO. But I understand the copyright argument. However, some bright kid with just an internet connection and device somewhere out there could make the next big breakthrough simply because they had free access to important scientific information and the mind to deal with it.

BTW, back in the early 90s, I met Carl Malamud at the Caffe Trieste. He was a regular back then. We had a number of discussions about computers, databases, and public access to knowledge, and where this explosive new technology was going. He lived a couple of blocks up from me on Telegraph Hill and we had some conversations at his pad. It's very gratifying to see where he is and what he's doing now. Well done, man.