Contribution to infection control
Big Data against noroviruses
Everyone would like to be spared from noroviruses. Noroviruses cause gastroenteritis with severe diarrhoea and vomiting. Since the infected persons are highly contagious, the pathogen spreads particularly quickly in communal facilities such as old people’s homes, hospitals, kindergartens and schools. So far, there is neither a vaccination nor medication against noroviruses. This makes it all the more important to be able to treat severe cases better and to prevent outbreaks altogether.
In the project Paving the way towards personalised prevention and care of severe Norovirus gastroenteritis, PRESENt for short, scientists from the University of Veterinary Medicine Hannover, the Hannover Medical School, the Helmholtz Centre for Infection Research and the L3S have been conducting clinical, biological and Big Data research since the beginning of 2020 with the aim of better understanding norovirus gastroenteritis and thus contributing to infection control.
To predict severe infections and possible intervention points, the L3S team is developing machine learning approaches from biological data. The scientists faced a number of challenges in doing so: For example, the available data was not sufficient for training and was noisy, i.e. inaccurate. However, since the data came from different types of biological information sources, the researchers were able to compensate for the lack of individual data types by effectively combining different data. They hypothesised that no one type of data alone would be able to fully uncover the complexity of the disease. Thus, joint learning from heterogeneous data would compensate for missing or unreliable information in each data type.
In one case, scientists are using multiple sources of biological information about human and viral proteins to study and predict their interaction patterns. Understanding the interaction plays a crucial role in revealing the underlying mechanism of viral infection and could help prevent and treat viral diseases. However, prediction is difficult because there is little data on virus-human interactions and most viruses mutate rapidly.
Researchers have already found a solution to the problem of small training data sets: They have developed a multitasking transfer learning approach that uses information from about 24 million protein sequences and the interaction patterns between human proteins in a cell.
Thi Ngan Dong
Thi Ngan Dong is a graduating PhD student at L3S Research Center funded by the PRESENt project. Her current research focuses on network analysis, feature selection, graph-based representation learning, and joint learning models from heterogeneous information sources.