Foto 7

Prof. Fabrizio Silvestri, DIAG, Sapienza University of Roma, Dr. Nicola Tonellotto, DII, University of Pisa, "Neural Models and Techniques in Natural Language Processing and Information Retrieval", 7-11 February 2022

20 hours (5 credits)


Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa - Ground Floor

To register to the course, click here

Short Abstract:

Advances from the natural language processing community have recently sparked a renaissance in the task of ad-hoc search. Particularly, large contextualized language modeling techniques, such as BERT, have equipped ranking models with a far deeper understanding of language than the capabilities of previous bag-of-words models. Applying these techniques to a new task is tricky, requiring knowledge of deep learning frameworks, and significant scripting and data munging. In this course, we provide background on classical (e.g., Bag of Words), modern (e.g., Learning to Rank). We introduce students to the Transformer architecture also showing how they are used in foundational aspects of modern large language models (e.g., BERT) and contemporary search ranking and re-ranking techniques. Going further, we detail and demonstrate how these can be easily experimentally applied to new search tasks in a new declarative style of conducting experiments exemplified by the PyTerrier search toolkit.

Course Contents in brief:

  1. PyTorch
  2. Language Models
  3. Self-attention
  4. Transformers
  5. BERT and beyond
  6. HuggingFace Transformers
  7. PyTerrier
  8. Classical IR: bag of words and probabilistic ranking
  9. Modern IR: learning to rank
  10. Contemporary IR: neural models and techniques


  1. Day 1 – 9 – 13. Intro to PyTorch, Language Models, Implementing Word2Vec in PyTorch. Examples in Google Colab.
  2. Day 2 – 9 – 13. Self-attention, Transformers, BERT, and Beyond. HuggingFace Transformers. Examples in Google Colab.
  3. Day 3 – 9 – 13. Intro to Information Retrieval. Classical models and limitations. PyTerrier. Examples in Google Colab.
  4. Day 4 – 9 – 13. Neural Models for IR. Examples in Google Colab.
  5. Day 5 – 9 – 13. Exam