The SARS-Cov-2 virus has been spreading since the end of 2019, and the complete genome sequence of the virus strain was available at the end of January 2020. More and more information is being integrated into public bioinformatics databases hosted by the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI). A research team at the Politecnico di Milano aims to support biologists in interpreting the increasing information on SARS-Cov-2. As part of the ERC Advanced Grant "Data-Driven Genomic Computing", the researchers of the Database Management Group led by Professor Stefano Ceri have developed a user-friendly search engine technology for genome analysis based on freely accessible human sequencing data - and, to introduce some of them, a database language for genomic data: the GenoMetric Query Language (GMQL). The goal is to combine data from different experiments. On the basis of GMQL, they developed GenoSurf, a search engine freely accessible on the web, which is intended to enable life scientists with limited computer science skills to query all the above-mentioned open data according to a variety of available criteria. Consequently, GenoSurf increases the interpretability of genomic data by being user-friendly and allowing biologists to formulate new biological hypotheses.
In the current phase of the epidemic, researchers have already made publicly available 3500 complete or nearly complete genome sequences of SARS-Cov-2, and this number is increasing daily. The data may contribute to understanding the virus and its spread. To this end, the group is currently expanding GenoSurf to include viral genomes - called ViruSurf - and starting with freely accessible information on SARS-Cov-2. L3S is involved in integrating genome alterations into ViruSurf and supports learning approaches for data that allow the machine generation of biological hypotheses. For example, alterations of all available SARS-Cov-2 genomes that cause pneumonia can be filtered and then validated with in vivo animal experiments. Prof. Wolfgang Nejdl from L3S will spend four months of his research semester in summer 2020 at the Politecnico di Milano.
Damianos Melidis is a PhD student at L3S and researches in the fields of data streams, data mining, machine learning and bioinformatics.