Detection of Important Genome Elements

Help to identify mobile genetic elements, for instance, to predict the spread of antibiotic resistances.

Contact

Yiqing Wang, University of Kiel Dustin Hanke, University of Kiel

Description

Transposable elements (TE) are frequent genomic sequences that can change their position ineffectively in the genome by copy & paste mechanisms. During this process they are often dragging essential or beneficial genes like antibiotic resistance genes to another location. Sometimes characteristics of TEs are missing or TEs are defragmented, which makes the identification challenging. In this project we are approaching to use the gene frequency landscape of genomic sequences to detect putative TE regions utilizing methods like LSTMs or transformer models.

Dataset

We provide a protein family frequency dataset of 9226 genomic sequences originating from Klebsiella, Escherichia, and Salmonella.

How you can contribute

We are happy about anyone to join, who is interested in understanding the fundamentals of transposable elements and wants to push our approach. Especially, we would be glad about participants having a strong technical background in applying LSTMs and/or Transformer models. Experience in adjusting, improving and building complex ML models would be beneficial.

Last updated