Challenges for the Hack-from-Home edition of the Copenhagen Bioinformatics hackathon were provided and mentored by our partner organisations including Novo Nordisk, Apollo Ventures, BioLib Technologies, and Copenhagen University. On this page you can find an overview of the challenges.
Scientific fraud and paper mill publications are a growing concern in modern day research which has increasingly come to light in the last several years. In this challenge, you have to develop an algorithm, using machine learning that can be used to identify fake academic papers. This challenge is mentored by Lars Juhl Jensen, a professor at Copenhagen University, whose group works within cellular network biology. Lars is also a co-founder of Intomics a leading Danish bioinformatics company.
The human genome consists of 3.15 billion base pairs. If printed out on paper, it would stand 130m tall. Lasse Folkersen challenges participants to explore creative ways of visualizing human genomes and to make unique art pieces influenced by the underlying DNA sequence. Teams choosing this challenge are expected to creatively use cutting-edge methods, such as generative adversarial networks (GANs). Success is evaluated by the same metric as modern art: it has to catch the attention of the viewers!
Metabolic disorders are accompanied by changes in tissues, and the study of these can help discover underlying disease mechanisms. The aim of this challenge is to compare a person's fat content in pancreas and liver tissue to age, BMI, gene expression profiles etc. The data will be derived from the publicly available GTEx database containing samples of different human tissues. Next to expression and genomic data, GTEx also contains histological images of the sampled tissue. From these images we will extract the fat content in the tissue and provide these values to the challenge team for analysis. Given this data set the team will work with Vanessa Isabell Jurtz and Alexander Junge from Novo Nordisk to find associations and correlations between different variables, look into associated pathways or create models capable of predicting tissue fat content.
The goal of this challenge is identifying disease-associated genome variants and separating them from variants considered harmless, or benign. We know from previous work that ~60% of disease-associated variants show loss of protein stability. Analysis of homologous sequences helps identify many of the other detrimental variants. Logistic regression on one dataset shows improved separation of pathogenic and benign variants. In the challenge you are invited to test if this applies to other proteins, if more sophisticated machine learning techniques further improve separation, and whether including additional information about the proteins and variants (e.g. splicing, exposure) boosts performance.
2020 - all the world is stuck at home and all the scientists try their best to help increase our knowledge about covid-19. One technique, namely Molecular Dynamics (MD) simulation allows us to explore the dynamic behavior and interactions at atomistic level when the recognition of molecules like a virus happen. Many researchers are running now such simulations and sharing their results with the world by uploading them into the web. Those dynamic information could be used to compare different simulations with each other, identify new findings and rationalize experimental observations. The problem is, nobody can find them - being hidden in all other (useful and unuseful) information! Can you help and find a clever and fast way to identify, collect, assign and classify all those MD simulations? The challenge is posed by the Lindorff-Larsen Group at Copenhagen University working on understanding protein dynamics by combining molecular dynamics simulations with experimental data. The mentor of this challenge is Johanna Tiemann a postdoc groups of Amelie Stein and Kresten Lindorff-Larsen.
Forkhead Box O (FOXO) transcription factors are evolutionary conserved proteins that regulate multiple biological processes such as development, metabolism and aging. FOXO's roles in aging modulation were initially discovered in C.elegans and has since been extensively studied and replicated in multiple model organisms. Interestingly, genome wide association studies (GWAS) revealed a strong link between extremen human longevity and SNP-variants of FOXO3 (one of the four FOXO human homologs). Further studies have demonstrated that multiple SNP variants of FOXO3 are associated with healthier phenotypes, for example, reduced risk of cardiovascular disease. This and more evidence highlights the important role FOXO3 plays in aging modulation and disease development. However, the oposite type of evidence is lacking: are there specific SNP variants of FOXO3 that make a person more susceptible for cardiovascular disease? By looking into the UK Biobank we would like you to help us find and answer to this and other related questions.
With the ongoing coronavirus pandemic, bioinformaticians are going against the tide to solve the mysteries that the virus contains. Coronaviruses are made of proteins, and we are interested in understanding these better to develop treatments and vaccines. Whereas discovering the sequences of the proteins is an easy task, the 3-dimensional structure (also known as folding) is more difficult to find. The experimental procedures to obtain protein structures are economically and time expensive. Therefore many bioinformaticians are now working on folding proteins in silico. While many people are using folding@home to brute-force protein structures, we present a challenge to develop more efficient ways to predict structures: protein folding algorithms based on machine learning. The starting point of this challenge is OpenProtein https://github.com/biolib/openprotein and the dataset to use is https://github.com/aqlaboratory/proteinnet.
Breast cancer (BRCA) is the second (in some cases - first) most common type of cancer worldwide. Multiple classifications were (and are) invented to simplify work of a medical doctor at the stage of diagnosis and treatment of BRCA. A novel classification by Curtis et al (2012), based on copy number alterations and RNA-Seq expressions, divides BRCA into ten subtyped, providing unique associations with survival and possible ways of treatment. This project is dedicated to building a platform-independent machine learning classifier based on gene expressions that could simplify diagnosis of BRCA by this classification.