Clicky

mobile btn
Thursday, December 26th, 2024

Sandia National Laboratories launches desktop data mining effort to aid COVID-19 research efforts

© Shutterstock

Sandia National Laboratories is turning to machine learning to assist COVID-19 researchers in combing through existing coronavirus studies and data.

Through a mix of algorithms, analytics, and data mining, Sandia wants to make it easy for scientists to find the data they need right at their office computers. Dubbing the effort rapid-response data science, the labs already ran a seven-day effort to demonstrate its use. Focused on key coronavirus documents, the labs attempted to help biosecurity and public health experts isolate those documents from the thousands of studies currently out there.

In just 10 minutes, they managed to utilize language and character similarities to reduce 29,000 published studies to 87.

“Medical and epidemiological experts can have near-immediate access to existing pertinent research without being data scientists,” Travis Bauer, Sandia computer scientist, said. “With some refinement, this new process can clarify questions our public health experts need answered to fast-track COVID-19 research, particularly as new studies quickly emerge.”

The investigation was a response to a federal fall for a machine-readable COVID-19 dataset, given the wealth of data out there. During the investigation, Bauer and his team ran two studies. One utilized two experts, the other had three. In an initial stage of that research, they made an index of existing research, graphed them, then put them into a searchable natural-language matrix, which they also indexed, to aid searchability and relevance. They then ran those documents against three visualization algorithms.

Of these, two proved potentially useful, with additional tuning. With an analysis algorithm, they then scored all COVID-19 documents through color, plotting them along a two-dimensional graph. This then allowed experts to explore the different colored clusters as batches of COVID-19 related documents. This process proved largely accurate in terms of relevance, though Bauer added that tuning the algorithms will improve performance.

“Even on my office laptop computer, we can sort millions of documents and make them available to the user,” Bauer said. “Technologically, it’s possible to rapidly research and adapt to experts’ needs as they work through a data set. The agility and speed with which the user interface can be developed with the right team on desktop computer systems can provide an ability to respond to specific queries quickly and adapt with the changing needs of the user.”