Immunosequencing Algorithms Group

Immuno-sequencing Algorithms Group



Bioinformatics · Immunogenetics · Immune repertoire profiling

Selected publications



Exploring the pre-immune landscape of antigen-specific T cells
Genome Medicine 2018

Our results suggest that the population frequencies of specific T cells are strikingly non-uniform across epitopes that are known to elicit immune responses. This inference leads to a new definition of epitope immunogenicity based on specific TCR frequencies, which can be estimated with a high degree of accuracy in silico, thereby providing a novel framework to integrate computational and experimental genomics with basic and translational research efforts in the field of T cell immunology


VDJdb: a curated database of T-cell receptor sequences with known antigen specificity
Nucleic Acids Res 2017

The primary goal of VDJdb is to facilitate access to existing information on TCR antigen specificities, i.e. the ability to recognize known epitopes presented by known major histocompatibility complex (MHC) class I and II molecules. Our mission is to aggregate TCR specificity information on a continuous basis and establish a curated repository to store these data in the public domain


VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires
PLoS Comp Biol 2015

Here we present VDJtools, a software framework that can analyze output of most commonly used TCR repertoire processing tools and allows applying a diverse set of post-analysis strategies. The main aims of our framework are: To ensure consistency of post-analysis methods and reproducibility of obtained results; to save the time of bioinformaticians analyzing TCR repertoire data by providing comprehensive tabular output and open-source API; and to provide a simple enough command line tool so that immunologists and biologists with little computational background could use it to generate publication-ready results


Towards error-free profiling of immune repertoires
Nature Methods 2014

Deep profiling of antibody and T cell–receptor repertoires by means of high-throughput sequencing has become an attractive approach for adaptive immunity studies, but its power is substantially compromised by the accumulation of PCR and sequencing errors. Here we report MIGEC (molecular identifier groups–based error correction), a strategy for high-throughput sequencing data analysis. MIGEC allows for nearly absolute error correction while fully preserving the natural diversity of complex immune repertoires




More publications:


PubMed Google Scholar

Ongoing projects



T-cell repertoire annotation

Exploring antigen specificities encoded in high-throughput T-cell receptor (TCR) sequencing data using a database of TCR sequences with known specificity. Discovering immune repertoire biomarkers using statistical approaches to TCR sequence motif inference. Linking specific TCR repertoire structure to the immunogenicity of cognate antigens.

MODELING TCR:pMHC COMPLEX

In-silico modeling of TCR:peptide:MHC complex structures. Linking structural data and the organization of T-cell repertoire: CD4/CD8 T-cell differentiation and αβ chain pairing. Building statistical models of TCR:pMHC contacts with an ultimate goal of developing an efficient method for TCR:pMHC binding prediction.



Antibodyome analysis

Developing fast algorithms for antibody lineage tree inference and somatic hypermutation analysis. Implementing novel approaches to high-througput antibody sequencing data analysis that focus on the clonal architecture instead of individual sequences/clonotypes. Exploring differences between B-cell subsets, B-cell memory and the structure of B-cell repertoire in cancer patients.

Epitope immunogenicity

In-silico modeling of TCR:pMHC complex structures. Linking structural data and the organization of T-cell repertoire: CD4/CD8 T-cell differentiation and αβ chain pairing. Building statistical models of TCR:pMHC contacts with an ultimate goal of developing an efficient method for TCR:pMHC binding prediction.

Software, databases, tutorials