Now, a new method of analysis, topological data analysis (TDA), promises to uncover the underlying patterns in everything from climate research data to crystalline materials and in digital pictures posted online.
Patterns in data -- big and small -- are hiding in a torrent of bits, bytes and data points, and the need to decipher real signals from false positives and anomalies has never been greater as epic global studies deliver ever bigger data sets. Originating in the world of mathematics and now spearheaded by researchers at the University of Liverpool, the ‘topological’ part of TDA refers to how it stems from formulae to describe the relationships between shapes and spaces.
“Topological data analysis visualises data in a way understandable for human experts,” says Dr Vitaliy Kurlin from the University of Liverpool. “It’s ideal for difficult cases when it’s hard to find patterns behind the data, and now it’s moving on from pure abstract models to practical uses.”
A 2011 paper published in PNAS showed how TDA was used on genetic data to reveal a previously unknown subset of a type of breast cancer. Astonishingly, these patients exhibited 100% survival and no tumour metastasis. Painful, expensive and invasive treatments could be safely avoided, and resources redirected to those more in need.
Kurlin and colleagues are now applying similar non-linear analysis in areas such as new materials discovery. Computer simulations can identify millions of potential materials defined by different atomic configurations, but which ones are worth synthesising for real, physical tests? TDA can be used to uncover unexpected patterns in data because of the way it works beyond scales, where you don’t know the distance between two points. The result can help researchers focus on better candidates for success, like properties of stability for example.
In climate research, Kurlin has used to TDA to analyse ‘atmospheric rivers’ of water vapour. Like a river in the sky, these filaments can stretch from Hawaii to California, and could carry as much water as the Amazon River. But are hugely complex to detect because they are not continuous -- it’s akin to trying to find formulas to explain the way smoke curls from a cigarette. “Climate scientists cannot detect atmospheric rivers or even agree a common definition because their geometric structures are highly volatile. I think they can be described only in topological terms,” Kurlin explains.
A more everyday use of TDA principles can be found in ‘super pixels’: these are connected regions of many similar pixels that make up an image. An eye, forehead or single colour matte background could be collated into super pixels, so compression software like the familiar jpegs we use in our smartphones have less complex data to deal with. This can save computer processing time and power, and assist in facial recognition in crime scene images, or for social media applications.
The momentum in this field has formed the Centre for Topological Data Analysis (CTDA), which brings together the expertise of a dozen researchers from Liverpool’s Department of Computer Science and Materials Innovation Factory, alongside the University of Oxford’s Mathematical Institute and Statistics Department, and the University of Swansea’s mathematics and physics departments. The Centre answers a £14 million call for `New approaches to data science’ issued by UK Research Council EPSRC (now a part of UK Research and Innovation).
“The University of Liverpool is a great environment for cooperation,” says Kurlin. “With the new Materials Innovation Factory built to host chemists and computer scientists and researchers from related disciplines, as well as industry links, it’s important to have people from different disciplines in the same place.”