Dr. Daniel Bojar is the Associate Senior Lecturer for Bioinformatics at the Department for Chemistry and Molecular Biology & the Wallenberg Centre for Molecular and Translational Medicine of the University of Gothenburg. He also is the recipient of a Branco Weiss Fellowship – Society in Science and of a Foresight Fellowship.
Machine Learning & Systems Glycobiology
Welcome to the Bojar lab at the Department of Chemistry and Molecular Biology & the Wallenberg Centre for Molecular and Translational Medicine of the University of Gothenburg, Sweden. Our research lives at the intersection of machine learning, glycobiology, and synthetic biology. Previously, we have developed the first applications of deep learning to glycobiology by using a language model to predict functional properties from glycan sequences (Bojar et al., Cell Host Microbe 2020). Now, a main research focus of our group is developing and applying further machine learning as well as bioinformatics methods to analyze sequence-to-function relationships in glycans and transform glycobiology into a true systems biology discipline. Additionally, we are interested in the effects of glycans on mammalian signaling pathways, the potential of synthetic biology to alter glycome signatures, and in the biomedical applications of glycans in work funded by the Branco Weiss Fellowship. Our group combines both computational and experimental expertise in multiple model organisms. We value and firmly believe in interdisciplinarity within people, pure creativity, and a healthy dose of irreverence for existing dogma.
DNA, RNA, and proteins — three types of biological sequences intimately familiar to any life science researcher and that make life, as we know it, possible. Less familiar (though at least just as important) are glycans or complex carbohydrates. These chains of various sugars or monosaccharides can either occur by themselves, for instance to constitute the capsules of bacteria, fungi, as well as plant cells, or alternatively adorn all kinds of other biomolecules such as proteins, lipids, or RNA. The specific glycan sequence that is physically attached to a protein fundamentally alters its properties and capabilities — fine-tuning stability, structure, and function. This results in a mélange of incredibly complex interactions, in turn producing the exceedingly complex phenomenon we know as life. And incredibly complex it is indeed, as glycans boast an alphabet of hundreds of monosaccharides, compared to the rather paltry 20 amino acids for proteins and four nucleotides for DNA.
Additionally, glycans are not merely the only nonlinear biological sequence — resulting in molecules with multiple branches — but are also the only non-templated sequence, being created via an interplay of dozens of specialized enzymes intimately dependent on the current state of the cell. All this makes glycans the most diverse biological sequence and also the most dynamic one, being able to adjust sequences on the fly without genetic mutations. On top of all this, glycans have been implicated in basically all human diseases, from inflammatory disorders to cancer, immediately hinting at their biomedical potential. Because of this enormous potential, we are committed to advance this promising field of glycobiology by any means necessary, ranging from molecular biology over synthetic biology up to computational approaches such as machine learning or bioinformatics. In fact, the very complexity of glycans that has hitherto prevented their comprehensive analysis and utilization makes them — among all biological sequences — ideally suited for state-of-the-art machine learning algorithms and their unique capabilities to extract insight and information from these sequences.
Bringing Light into the Dark Matter of Biology: Sequence-to-Function Analyses for Glycans
As biological sequences, glycans are akin to languages, with information in sequence motifs and order. To take a simple example, the exact sequence of the glycan attached to the antibody protein IgG modulates its stability, function, and activity. So while it is possible to tie sequence attributes to functional properties of glycans, at the moment this is only feasible by painstaking manual labor and a considerable investment of time and resources. We have begun to remedy this bottleneck with the development of dedicated deep learning methods that can rapidly analyze tens of thousands of glycans. By treating glycans as a biological language, we have trained a machine learning-based language model (Bojar et al., Cell Host Microbe 2020), which has enabled us to build sequence-to-function models for a plethora of applications. We have already demonstrated the utility of this platform by training classifiers to predict glycan immunogenicity, pathogenicity of bacterial strains, and taxonomic origins of glycans purely from glycan sequences. Next to predicted glycan properties, this process also can be used to identify relevant glycan motifs, suggest modifications for glycoengineering, and provide insight into the investigated biological processes. We are currently engaged in improving these algorithms, developing new tools and platforms for glycan-focused machine learning, and developing sequence-to-function models for more applications in glycobiology. One example of this can be found in our graph neural network SweetNet that can quantitatively predict virus-glycan binding, important for the discovery of novel viral receptors, and cluster species according to phenotypic/environmental characteristics (Burkholz et al., 2021).
From Genes to Glycans — Predicting Glycan Structures and Repertoires via Machine Learning
The key limitation in glycobiology today is the relative lack of sequences that, in most cases, have to be acquired by low-throughput mass spectrometry. Lifting this restriction would fully unleash the potential of glycans and glycan-focused machine learning to elevate our understanding of molecular biology by integrating them into standard system biology workflows and help to develop the next generation of biomedical therapies. We are pursuing several strategies to convert this prospect into reality. These strategies encompass the development of machine learning algorithms to enhance the resolution and amount of information obtained by traditional measurement techniques yet also extend to projects with the aim to predict the glycan repertoire of a biological system, uncoupling glycomics from mass spectrometry.
Leveraging Synthetic Glycobiology for Biomedical Applications
Glycans are targeted in several autoimmune diseases. Glycans aid cancer in evading the immune system. Glycans facilitate the cellular entry of most viruses. Modifying these glycans with enzymes coupled to glycan-binding lectins in a targeted manner could constitute a new form of biomedical therapy. Yet currently glycans can only be targeted in a more or less trial-and-error fashion, which is why we are working on a platform to predictively alter glycan sequences. With the help of sequence-to-function models we can identify relevant glycan motifs that then can be modified with methods derived from synthetic biology. We are interested in pursuing these modifications for both studying the impact of glycans on natural biological systems as well as for therapeutic applications. Further, we envision the usage of glycans as an additional layer of complexity in existing synthetic biology applications.
Selected Publications (Full List)
03/2021 Burkholz, R., Quackenbush, J., and Bojar, D. Using Graph Convolutional Neural Networks to Learn a Representation for Glycans. bioRxiv, doi:10.1101/2021.03.01.433491.
10/2020 Bojar, D., Powers, R.K., Camacho, D.M., and Collins J.J. Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions. Cell Host Microbe, 29(1):132-144.
04/2020 Bojar, D., Powers, R.K., Camacho, D.M., and Collins J.J. SweetOrigins: Extracting Evolutionary Information from Glycans. bioRxiv, doi:10.1101/2020.04.08.031948.
01/2020 Bojar, D., Camacho, D.M., and Collins J.J. Using Natural Language Processing to Learn the Grammar of Glycans. bioRxiv, doi:10.1101/2020.01.10.902114v1.
04/2019 Kim, H.*, Bojar, D.*, and Fussenegger, M. A CRISPR/Cas9-based central processing unit to program complex logic computation in human cells. Proc Natl Acad Sci USA, 9:7214-7219. Co-first authorship.
06/2018 Bojar, D., Scheller, L., Charpin-El Hamri, G., Xie, M., and Fussenegger, M. Caffeine-inducible gene switches controlling experimental diabetes. Nat Commun, 9:2318.
04/2018 Kojima, R.*, Bojar, D.*, Rizzi, G., Charpin-El Hamri, G., El Baba, M., Saxena, P., Auslaender, S., Tan, K.R., and Fussenegger, M. Designer exosomes produced by implanted cells intracerebrally deliver therapeutic cargo for Parkinson’s disease treatment. Nat Commun, 9:1305. Co-first authorship.
We are always looking for highly motivated & open-minded innovators at every career stage! We are interested in both experimentalists as well as computational specialists (bonus points if you are comfortable with both and / or willing to learn). Just send your CV and a cover letter describing your research interests and skill set to email@example.com. We are looking forward to your application!