Dr. Daniel Bojar is the Associate Senior Lecturer for Bioinformatics at the Department for Chemistry and Molecular Biology & the Wallenberg Centre for Molecular and Translational Medicine of the University of Gothenburg. He also is the recipient of a Branco Weiss Fellowship – Society in Science and part of the Forbes 30 Under 30 Europe list.
Read about our research in the media:
Wet-lab + computational MSc thesis projects
Machine Learning & Systems Glycobiology
Welcome to the Bojar lab at the Department of Chemistry and Molecular Biology & the Wallenberg Centre for Molecular and Translational Medicine of the University of Gothenburg, Sweden. Our research lives at the intersection of machine learning, glycobiology, and synthetic biology. Previously, we have developed the first applications of deep learning to glycobiology by using a language model to predict functional properties from glycan sequences (Bojar et al., Cell Host Microbe 2020), which has been covered by Quanta Magazine and which we have for instance applied to analyze protein-glycan interactions (Lundstrøm et al., Adv Sci 2021). Our group is developing and applying further machine learning and bioinformatics methods to analyze sequence-to-function relationships in glycans and transform glycobiology into a true systems biology discipline. Additionally, we are interested in the effects of glycans on mammalian signaling pathways, the potential of synthetic biology to alter glycome signatures, and in the biomedical applications of glycans in work funded by the Branco Weiss Fellowship. Our group combines both computational and experimental expertise in multiple model organisms. We value and firmly believe in interdisciplinarity within people, pure creativity, and a healthy dose of irreverence for existing dogma.
Luc is a postdoctoral fellow, working as a bioinformatician to develop machine learning methods and to apply them to the wide field of glycobiology. In 2020, he defended his PhD thesis at the University of Strasbourg within the team of Dr. Alain Lescure where he spent most of his research time working on stress response and adaptation in vertebrate species. This work was done in collaboration with Adisseo France, a French company involved in the development of nutritional solutions intended to improve resilience of livestock. During his PhD, Luc mostly conducted RNA-seq data processing and analyses, implemented network biology methods and performed comparative genomics studies. He envisions that the experience acquired applying these various bioinformatics methods to different biological topics will help us to uncover the complexity of glycans and contribute to the expansion of our knowledge in glycobiology.
Jon earned his master’s degree in molecular biomedicine from the University of Copenhagen, Denmark, where he investigated immunosuppressive functions of regulatory T cells in multiple sclerosis and neuronal dysfunction in Parkinson’s disease. During his studies, he developed a special interest in high-throughput approaches and the integration of experimental and computational methods to elucidate molecular mechanisms of complex biological systems.
Jon is fascinated by the seemingly endless complexity of glycobiology and excited to contribute towards understanding the sequence-to-function relationships of glycans, as well as we do those of DNA, RNA, and protein, one experiment – or line of code – at a time.
When Jon is not working, you’ll find him obsessing over brewing the perfect cup of coffee or running ultra-long distances, preferably in steep & technical terrain.
Emma is currently a research assistant and has completed her MSc thesis in the group. In 2020, she conducted her Bachelor’s thesis in Biomedicine at Lund University, where she investigated the activity of enzymes from novel streptococci for potential biotechnological or therapeutical applications. She is interested in learning more about the role of glycans in human health. In her project, Emma investigates how evolutionary changes in components in the human glycome have influenced immune activity and the susceptibility to infections.
James completed his master’s degree in molecular bioengineering at Imperial College London. During which he spent much of his time tinkering with and learning about how generative models can be applied to biology. His master’s dissertation was focused on investigating whether the language of S. cerevisiae DNA could be modelled well enough to computationally generate viable promoters.
During his PhD, James will seek to further understand biological grammars adhered to by various molecules. He also hopes the nature of the relationships between these sequences in the genome, transcriptome, and glycome become more apparent.
In James’ free time he will be trying to speak to someone in their native language, going to the gym, and overusing the random article button on Wikipedia.
Viktoria has been a Master’s student in the group from September 2021 until June 2022.
DNA, RNA, and proteins — three types of biological sequences intimately familiar to any life science researcher and that make life, as we know it, possible. Less familiar (though at least just as important) are glycans or complex carbohydrates. These chains of sugars or monosaccharides either occur by themselves, for instance constituting the capsule of bacteria, fungi, as well as plant cells, or adorning other biomolecules such as proteins, lipids, or RNA. The specific glycan sequence that is physically attached to a protein fundamentally alters its properties and capabilities — fine-tuning stability, structure, and function. This results in a mélange of incredibly complex interactions, as glycans boast an alphabet of hundreds of monosaccharides, compared to the rather paltry 20 amino acids for proteins and four nucleotides for DNA.
Glycans are not merely the only nonlinear biological sequence — resulting in molecules with multiple branches — but are also the only non-templated sequence, created by dozens of specialized enzymes dependent on the current cellular state. This makes glycans also the most dynamic biological sequence, being able to adjust sequences on the fly without genetic mutations. Glycans have also been implicated in basically all human diseases, from inflammatory disorders to cancer, immediately hinting at their biomedical potential. Because of this enormous potential, we are committed to advance this promising field of glycobiology by any means necessary, ranging from molecular biology over synthetic biology up to computational approaches such as machine learning or bioinformatics. In fact, the very complexity of glycans that has hitherto prevented their comprehensive analysis and utilization makes them — among all biological sequences — ideally suited for state-of-the-art machine learning algorithms and their unique capabilities to extract insight and information from these sequences. We are working towards democratizing access to these resources via our open-source Python package glycowork (Thomès et al., Glycobiology, 2021), which we have for instance used to analyze sequence properties of fucose-containing motifs across various taxonomic kingdoms (Thomès et al., Front Mol Biosci, 2021)
Bringing Light into the Dark Matter of Biology: Sequence-to-Function Analyses for Glycans
As biological sequences, glycans are akin to language, with information in sequence motifs and order. To take a simple example, the exact sequence of the glycan attached to the antibody protein IgG modulates its stability, function, and activity. So while it is possible to tie sequence attributes to functional properties of glycans, at the moment this is only feasible by painstaking manual labor and a considerable investment of time and resources. We have begun to remedy this bottleneck with the development of dedicated deep learning methods that can rapidly analyze tens of thousands of glycans. By treating glycans as a biological language, we have trained a machine learning-based language model (Bojar et al., Cell Host Microbe 2020), which has enabled us to build sequence-to-function models for a plethora of applications. We have already demonstrated the utility of this platform by training classifiers to predict glycan immunogenicity, pathogenicity of bacterial strains, and taxonomic origins of glycans purely from glycan sequences. Next to predicted glycan properties, this process also can be used to identify relevant glycan motifs, suggest modifications for glycoengineering, and provide insight into the investigated biological processes. We are currently engaged in improving these algorithms, developing new tools and platforms for glycan-focused machine learning, and developing sequence-to-function models for more applications in glycobiology. One example of this can be found in our graph neural network SweetNet that can quantitatively predict virus-glycan binding, important for the discovery of novel viral receptors, and cluster species according to phenotypic/environmental characteristics (Burkholz et al., Cell Reports, 2021). We have since generalized our work to predicting all kinds of protein-glycan interactions using machine learning (Bojar et al., ACS Chem Biol, 2022) as well as deep learning (Lundstrøm et al., Adv Sci, 2021). Our deep learning model LectinOracle is trained on over half a million unique protein-glycan interactions, using information from both protein and glycan sequences, and can generalize to new proteins, new glycans, and new contexts. We have used this method to investigate varied biological contexts such as the microbiome or viral epidemics.
From Genes to Glycans — Predicting Glycan Structures and Repertoires via Machine Learning
The key limitation in glycobiology today is the relative lack of sequences that, in most cases, have to be acquired by low-throughput mass spectrometry. Lifting this restriction would fully unleash the potential of glycans and glycan-focused machine learning to elevate our understanding of molecular biology by integrating them into standard system biology workflows and help to develop the next generation of biomedical therapies. We are pursuing several strategies to convert this prospect into reality. These strategies encompass the development of machine learning algorithms to enhance the resolution and amount of information obtained by traditional measurement techniques yet also extend to projects with the aim to predict the glycan repertoire of a biological system, uncoupling glycomics from mass spectrometry.
Leveraging Synthetic Glycobiology for Biomedical Applications
Glycans are targeted in several autoimmune diseases. Glycans aid cancer in evading the immune system. Glycans facilitate the cellular entry of most viruses. Modifying these glycans with enzymes coupled to glycan-binding lectins in a targeted manner could constitute a new form of biomedical therapy. Yet currently glycans can only be targeted in a more or less trial-and-error fashion, which is why we are working on a platform to predictively alter glycan sequences. With the help of sequence-to-function models we can identify relevant glycan motifs that then can be modified with methods derived from synthetic biology. We are interested in pursuing these modifications for both studying the impact of glycans on natural biological systems as well as for therapeutic applications. Further, we envision the usage of glycans as an additional layer of complexity in existing synthetic biology applications.
Selected Publications (Full List)
06/2022 Qin, R., Mahal, L.K., and Bojar, D. Deep Learning Explains the Biology of Branched Glycans from Single-Cell Sequencing Data, bioRxiv, doi:10.1101/2022.06.27.497708.
02/2022 Lundstrøm, J. and Bojar, D. Structural insights into host–microbe glycointeractions. Curr Opin Struct Biol, doi:10.1016/j.sbi.2022.102337.
01/2022 Bojar, D., Meche, L., Meng, G., Eng, W., Smith, D.F., Cummings, R.D., Mahal, L.K. A Useful Guide to Lectin Binding: Machine-Learning Directed Annotation of 57 Unique Lectin Specificities. ACS Chem Biol, doi:10.1021/acschembio.1c00689.
12/2021 Lundstrøm, J., Korhonen, E., Lisacek, F., and Bojar, D. LectinOracle – A Generalizable Deep Learning Model for Lectin-Glycan Binding Prediction. Adv Sci, 2103807. (doi:10.1002/advs.202103807).
09/2021 Thomès, L. and Bojar, D. The role of fucose-containing glycan motifs across taxonomic kingdoms. Front Mol Biosci, 8:755577. (doi:10.3389/fmolb.2021.755577)
06/2021 Thomès, L., Burkholz, R., and Bojar, D. Glycowork: A Python package for glycan data science and machine learning. Glycobiology, cwab067. (doi:10.1093/glycob/cwab067)
06/2021 Burkholz, R., Quackenbush, J., and Bojar, D. Using Graph Convolutional Neural Networks to Learn a Representation for Glycans. Cell Rep, 35:109251.
10/2020 Bojar, D., Powers, R.K., Camacho, D.M., and Collins J.J. Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions. Cell Host Microbe, 29(1):132-144.
04/2020 Bojar, D., Powers, R.K., Camacho, D.M., and Collins J.J. SweetOrigins: Extracting Evolutionary Information from Glycans. bioRxiv, doi:10.1101/2020.04.08.031948.
01/2020 Bojar, D., Camacho, D.M., and Collins J.J. Using Natural Language Processing to Learn the Grammar of Glycans. bioRxiv, doi:10.1101/2020.01.10.902114v1.
04/2019 Kim, H.*, Bojar, D.*, and Fussenegger, M. A CRISPR/Cas9-based central processing unit to program complex logic computation in human cells. Proc Natl Acad Sci USA, 9:7214-7219. Co-first authorship.
06/2018 Bojar, D., Scheller, L., Charpin-El Hamri, G., Xie, M., and Fussenegger, M. Caffeine-inducible gene switches controlling experimental diabetes. Nat Commun, 9:2318.
04/2018 Kojima, R.*, Bojar, D.*, Rizzi, G., Charpin-El Hamri, G., El Baba, M., Saxena, P., Auslaender, S., Tan, K.R., and Fussenegger, M. Designer exosomes produced by implanted cells intracerebrally deliver therapeutic cargo for Parkinson’s disease treatment. Nat Commun, 9:1305. Co-first authorship.
We are always looking for highly motivated & open-minded innovators at every career stage! We are interested in both experimentalists as well as computational specialists (bonus points if you are comfortable with both and / or willing to learn). Just send your CV and a cover letter describing your research interests and skill set to firstname.lastname@example.org. We are looking forward to your application!