Dr. Daniel Bojar is the Associate Senior Lecturer for Bioinformatics at the Department for Chemistry and Molecular Biology & the Wallenberg Centre for Molecular and Translational Medicine of the University of Gothenburg. He also is the recipient of a Branco Weiss Fellowship – Society in Science and part of the Forbes 30 Under 30 Europe list.
Read about our research in the media:
Multiomics – A Multi-Layered Answer to Multi-Layered Questions
Revealing the world of carbohydrates – using AI
Researchers Read the Sugary ‘Language’ on Cell Surfaces
Learning the language of sugars
New AI model helps understand virus spread from animals to humans
Wet-lab + computational MSc thesis projects
ChatGPT’s view on glycans:
Glycans, oh glycans, such fascinating things,
Complex sugars that make the heart sing,
From the structure to the bonds they bring,
Their presence makes the cells take wing.
In every tissue, they play a key role,
With proteins they form a beautiful dole,
A dance of sorts, that makes the heart whole,
And keeps the body functioning on a roll.
Some are long, some are short, but all are sweet,
Their diversity is quite a feat,
In sickness and health, they can’t be beat,
Their importance can’t be overstated, complete.
So here’s to glycans, our sweetest friend,
In biology, they’ll always trend,
For they are the building blocks that extend,
The very essence of life till the very end.
Machine Learning & Systems Glycobiology
Welcome to the Bojar lab at the Department of Chemistry and Molecular Biology & the Wallenberg Centre for Molecular and Translational Medicine of the University of Gothenburg, Sweden. Our research lives at the intersection of machine learning, glycobiology, and synthetic biology. Previously, we have developed the first applications of deep learning to glycobiology by using a language model to predict functional properties from glycan sequences (Bojar et al., Cell Host Microbe 2020), which has been covered by Quanta Magazine and which we have for instance applied to analyze protein-glycan interactions (Lundstrøm et al., Adv Sci 2021). Our group is developing and applying further machine learning and bioinformatics methods to analyze sequence-to-function relationships in glycans and transform glycobiology into a true systems biology discipline. Additionally, we are interested in the effects of glycans on mammalian signaling pathways, the potential of synthetic biology to alter glycome signatures, and in the biomedical applications of glycans in work funded by the Branco Weiss Fellowship. Our group combines both computational and experimental expertise in multiple model organisms. We value and firmly believe in interdisciplinarity within people, pure creativity, and a healthy dose of irreverence for existing dogma.
James completed his master’s degree in molecular bioengineering at Imperial College London. During which he spent much of his time tinkering with and learning about how generative models can be applied to biology. His master’s dissertation was focused on investigating whether the language of S. cerevisiae DNA could be modelled well enough to computationally generate viable promoters.
During his PhD, James will seek to further understand biological grammars adhered to by various molecules. He also hopes the nature of the relationships between these sequences in the genome, transcriptome, and glycome become more apparent.
In James’ free time he will be trying to speak to someone in their native language, going to the gym, and overusing the random article button on Wikipedia.
Jon earned his master’s degree in molecular biomedicine from the University of Copenhagen, Denmark, where he investigated immunosuppressive functions of regulatory T cells in multiple sclerosis and neuronal dysfunction in Parkinson’s disease. During his studies, he developed a special interest in high-throughput approaches and the integration of experimental and computational methods to elucidate molecular mechanisms of complex biological systems.
Jon is fascinated by the seemingly endless complexity of glycobiology and excited to contribute towards understanding the sequence-to-function relationships of glycans, as well as we do those of DNA, RNA, and protein, one experiment – or line of code – at a time.
When Jon is not working, you’ll find him obsessing over brewing the perfect cup of coffee or running ultra-long distances, preferably in steep & technical terrain.
In July 2022, Nadieh completed her Master’s degree in Bioscience with a focus on the biogenesis and secretion of Synaptic-Like Micro-Vesicles in pancreatic beta cells at the Department of Physiology/Metabolic Physiology at the Sahlgrenska Academy, University of Gothenburg. She is interested to learn more about both wet lab and computational techniques about glycans in our lab.
Luc completed his postdoc in the group, working as a bioinformatician to develop machine learning methods and to apply them to the wide field of glycobiology. Among others, he was involved in glycowork, milk glycan biosynthetic networks, and the analysis of Fuc-containing glycans.
Viktoria has been a Master’s student in the group from September 2021 until June 2022.
Emma has completed her MSc thesis and a subsequent position as research assistant in the group (until October 2022).
DNA, RNA, and proteins — three types of biological sequences intimately familiar to any life science researcher and that make life, as we know it, possible. Less familiar (though at least just as important) are glycans or complex carbohydrates. These chains of sugars or monosaccharides either occur by themselves, for instance constituting the capsule of bacteria, fungi, as well as plant cells, or adorning other biomolecules such as proteins, lipids, or RNA. The specific glycan sequence that is physically attached to a protein fundamentally alters its properties and capabilities — fine-tuning stability, structure, and function. This results in a mélange of incredibly complex interactions, as glycans boast an alphabet of hundreds of monosaccharides, compared to the rather paltry 20 amino acids for proteins and four nucleotides for DNA.
Glycans are not merely the only nonlinear biological sequence — resulting in molecules with multiple branches — but are also the only non-templated sequence, created by dozens of specialized enzymes dependent on the current cellular state. This makes glycans also the most dynamic biological sequence, being able to adjust sequences on the fly without genetic mutations. Glycans have also been implicated in basically all human diseases, from inflammatory disorders to cancer, immediately hinting at their biomedical potential. Because of this enormous potential, we are committed to advance this promising field of glycobiology by any means necessary, ranging from molecular biology over synthetic biology up to computational approaches such as machine learning or bioinformatics. In fact, the very complexity of glycans that has hitherto prevented their comprehensive analysis and utilization makes them — among all biological sequences — ideally suited for state-of-the-art machine learning algorithms and their unique capabilities to extract insight and information from these sequences. We are working towards democratizing access to these resources via our open-source Python package glycowork (Thomès et al., Glycobiology, 2021), which we have for instance used to analyze sequence properties of fucose-containing motifs across various taxonomic kingdoms (Thomès et al., Front Mol Biosci, 2021)
Bringing Light into the Dark Matter of Biology: Sequence-to-Function Analyses for Glycans
As biological sequences, glycans are akin to language, with information in sequence motifs and order. To take a simple example, the exact sequence of the glycan attached to the antibody protein IgG modulates its stability, function, and activity. So while it is possible to tie sequence attributes to functional properties of glycans, at the moment this is only feasible by painstaking manual labor and a considerable investment of time and resources. We have begun to remedy this bottleneck with the development of dedicated deep learning methods that can rapidly analyze tens of thousands of glycans. By treating glycans as a biological language, we have trained a machine learning-based language model (Bojar et al., Cell Host Microbe 2020), which has enabled us to build sequence-to-function models for a plethora of applications. We have already demonstrated the utility of this platform by training classifiers to predict glycan immunogenicity, pathogenicity of bacterial strains, and taxonomic origins of glycans purely from glycan sequences. Next to predicted glycan properties, this process also can be used to identify relevant glycan motifs, suggest modifications for glycoengineering, and provide insight into the investigated biological processes. We are currently engaged in improving these algorithms, developing new tools and platforms for glycan-focused machine learning, and developing sequence-to-function models for more applications in glycobiology. One example of this can be found in our graph neural network SweetNet that can quantitatively predict virus-glycan binding, important for the discovery of novel viral receptors, and cluster species according to phenotypic/environmental characteristics (Burkholz et al., Cell Reports, 2021). We have since generalized our work to predicting all kinds of protein-glycan interactions using machine learning (Bojar et al., ACS Chem Biol, 2022) as well as deep learning (Lundstrøm et al., Adv Sci, 2021). Our deep learning model LectinOracle is trained on over half a million unique protein-glycan interactions, using information from both protein and glycan sequences, and can generalize to new proteins, new glycans, and new contexts. We have used this method to investigate varied biological contexts such as the microbiome or viral epidemics.
From Genes to Glycans — Predicting Glycan Structures and Repertoires via Machine Learning
The key limitation in glycobiology today is the relative lack of sequences that, in most cases, have to be acquired by low-throughput mass spectrometry. Lifting this restriction would fully unleash the potential of glycans and glycan-focused machine learning to elevate our understanding of molecular biology by integrating them into standard system biology workflows and help to develop the next generation of biomedical therapies. We are pursuing several strategies to convert this prospect into reality. These strategies encompass the development of machine learning algorithms to enhance the resolution and amount of information obtained by traditional measurement techniques yet also extend to projects with the aim to predict the glycan repertoire of a biological system, uncoupling glycomics from mass spectrometry. Recent efforts in that direction, also described in a popular article, have shown that scRNA-seq data can be used to predict the single-cell levels of glycan epitopes.
Leveraging Synthetic Glycobiology for Biomedical Applications
Glycans are targeted in several autoimmune diseases. Glycans aid cancer in evading the immune system. Glycans facilitate the cellular entry of most viruses. Modifying these glycans with enzymes coupled to glycan-binding lectins in a targeted manner could constitute a new form of biomedical therapy. Yet currently glycans can only be targeted in a more or less trial-and-error fashion, which is why we are working on a platform to predictively alter glycan sequences. With the help of sequence-to-function models we can identify relevant glycan motifs that then can be modified with methods derived from synthetic biology. We are interested in pursuing these modifications for both studying the impact of glycans on natural biological systems as well as for therapeutic applications. Further, we envision the usage of glycans as an additional layer of complexity in existing synthetic biology applications.
Selected Publications (Full List)
05/2023 Lundstrøm J., Urban, J., Thomès, L., and Bojar, D. GlycoDraw: A Python Implementation for Generating High-Quality Glycan Figures, bioRxiv, doi:10.1101/2023.05.20.541563.
03/2023 Joeres, R., Bojar, D., and Kalinina, O.V. GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES, J Cheminformatics, doi:10.1186/s13321-023-00704-0.
02/2023 Thomès, L., Karlsson, V., Lundstrøm J., and Bojar, D. Mammalian Milk Glycomes: Connecting the Dots between Evolutionary Conservation and Biosynthetic Pathways, bioRxiv, doi:10.1101/2023.02.04.527106.
01/2023 Chunsheng, J., Lundstrøm J., Korhonen, E., Luis, A.S., and Bojar, D. Breast Milk Oligosaccharides Contain Immunomodulatory Glucuronic Acid and LacdiNAc, bioRxiv, doi:10.1101/2023.01.16.524336.
09/2022 Qin, R., Mahal, L.K., and Bojar, D. Deep Learning Explains the Biology of Branched Glycans from Single-Cell Sequencing Data, iScience, doi:10.1016/j.isci.2022.105163.
08/2022 Bojar, D. and Lisacek, F. Glycoinformatics in the Artificial Intelligence Era, Chem Rev, doi:10.1021/acs.chemrev.2c00110.
02/2022 Lundstrøm, J. and Bojar, D. Structural insights into host–microbe glycointeractions. Curr Opin Struct Biol, doi:10.1016/j.sbi.2022.102337.
01/2022 Bojar, D., Meche, L., Meng, G., Eng, W., Smith, D.F., Cummings, R.D., Mahal, L.K. A Useful Guide to Lectin Binding: Machine-Learning Directed Annotation of 57 Unique Lectin Specificities. ACS Chem Biol, doi:10.1021/acschembio.1c00689.
12/2021 Lundstrøm, J., Korhonen, E., Lisacek, F., and Bojar, D. LectinOracle – A Generalizable Deep Learning Model for Lectin-Glycan Binding Prediction. Adv Sci, 2103807. (doi:10.1002/advs.202103807).
09/2021 Thomès, L. and Bojar, D. The role of fucose-containing glycan motifs across taxonomic kingdoms. Front Mol Biosci, 8:755577. (doi:10.3389/fmolb.2021.755577)
06/2021 Thomès, L., Burkholz, R., and Bojar, D. Glycowork: A Python package for glycan data science and machine learning. Glycobiology, cwab067. (doi:10.1093/glycob/cwab067)
06/2021 Burkholz, R., Quackenbush, J., and Bojar, D. Using Graph Convolutional Neural Networks to Learn a Representation for Glycans. Cell Rep, 35:109251.
10/2020 Bojar, D., Powers, R.K., Camacho, D.M., and Collins J.J. Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions. Cell Host Microbe, 29(1):132-144.
04/2020 Bojar, D., Powers, R.K., Camacho, D.M., and Collins J.J. SweetOrigins: Extracting Evolutionary Information from Glycans. bioRxiv, doi:10.1101/2020.04.08.031948.
01/2020 Bojar, D., Camacho, D.M., and Collins J.J. Using Natural Language Processing to Learn the Grammar of Glycans. bioRxiv, doi:10.1101/2020.01.10.902114v1.
04/2019 Kim, H.*, Bojar, D.*, and Fussenegger, M. A CRISPR/Cas9-based central processing unit to program complex logic computation in human cells. Proc Natl Acad Sci USA, 9:7214-7219. Co-first authorship.
06/2018 Bojar, D., Scheller, L., Charpin-El Hamri, G., Xie, M., and Fussenegger, M. Caffeine-inducible gene switches controlling experimental diabetes. Nat Commun, 9:2318.
04/2018 Kojima, R.*, Bojar, D.*, Rizzi, G., Charpin-El Hamri, G., El Baba, M., Saxena, P., Auslaender, S., Tan, K.R., and Fussenegger, M. Designer exosomes produced by implanted cells intracerebrally deliver therapeutic cargo for Parkinson’s disease treatment. Nat Commun, 9:1305. Co-first authorship.
We are always looking for highly motivated & open-minded innovators at every career stage! We are interested in both experimentalists as well as computational specialists (bonus points if you are comfortable with both and / or willing to learn). Just send your CV and a cover letter describing your research interests and skill set to firstname.lastname@example.org. We are looking forward to your application!