When astronomer Kai Polsterer’s laptop was stolen, the thieves made off with more than hardware. The laptop contained Polsterer’s only copy of a collection of thousands of stars and galaxies, a sample that a computer algorithm had randomly selected from a data set consisting of millions of celestial objects. Because Polsterer could not re-create what the algorithm had done, he could not exactly reproduce his data set for a work-in-progress journal article. And without a data set, nobody could exactly reproduce his results.

Irreproducibility and the black box nature of machine learning plague many fields of science, from Earth observation to drug discovery. But astronomy represents a notable case study because the quantity of data is burgeoning at an unprecedented rate. The installation of new data-churning telescopes, combined with marked improvements in pattern-finding algorithms, has led astronomers to turn to sophisticated software to do the data-crunching they can’t do manually. And with more powerful analyses comes less transparency as to how they were performed.

To read more, click here.