Hours:
16 hours (4 credits)
Room:
Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa - Ground Floor
To register to the course, click here
Short Abstract:
This PhD course focuses on Web search and discusses the challenges in the three main areas of Web search: i) crawling, ii) indexing, and iii) query processing. The course introduces each area by discussing the state of the art in the field and by presenting the open research questions. The emphasis of the course is on query processing, an area where machine learning provides an important contribution to advance the state of art. After an introduction of the different query processing techniques, the course i) introduces supervised techniques explicitly focused to target the ranking problem, ii) discusses several efficiency/effectiveness trade-offs in query processing and iii) analyse several related optimization techniques. The course will also provide an overview of the query processing techniques employing deep neural networks. Two hands-on sessions will cover indexing and query processing of public Web collections.
Course Contents in brief:
- Modern Web Search (4 hours)
- The web: history, peculiarities and the importance of the search.
- Anatomy of a modern Web search engine: crawling, indexing, query processing.
- Crawling: definition and application. Architecture of a modern crawler.
- Challenges in crawling the Web
- Fast Indexes for Web search (4 hours)
- Data structures for indexing Web documents
- Modern techniques for efficient text retrieval
- Data structures for efficient k-NN search and retrieval over learned representations
- Challenges in indexing the Web
- Hands On: Indexing and basic query processing on a public Web collection
- Machine learning in modern query processors (8 hours)
- Machine learning approaches for IR: Learning to Rank
- Efficiency/Effectiveness Trade-offs, Cascading Architectures
- Neural information retrieval and the role of pre-trained large language models
- Dense/Sparse retrieval
- Hands On: Learning to Rank and Deep Neural Networks for efficient Web search
Schedule:
- 18/06/2024: 9:00 - 13:00
- 19/06/2024: 9:00 - 13:00
- 20/06/2024: 9:00 - 13:00
- 21/06/2024: 9:00 - 13:00