Hours:
20 hours (5 credits)
Room:
Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa - Ground Floor
Aula Riunioni del Piano 6 del Dipartimento di Ingegneria dell’Informazione, Largo Lucio Lazzarino 1, Pisa
To register to the course, click here
Short Abstract:
Transformers have brought a significant shift in the AI field through the introduction of a novel learning and information processing paradigm. Their attention mechanisms, including self-attention, empower them to grasp intricate patterns and relationships in data, rendering them highly adaptable in addressing a variety of complex tasks. Transformers are capable of executing an extensive spectrum of AI tasks: Machine Translation, Text Generation, Sentiment Analysis, Named Entity Recognition, Text Classification, Image Generation and Processing, Image Captioning, Image Generation, Style Transfer, Object Detection, Multimodal AI, Multimodal Translation, Visual Question Answering, Text-to-Image Synthesis, Recommendation Systems, Time Series Analysis and Prediction, Speech Recognition and Synthesis, Graph-based Tasks, Molecular Structure Prediction, Conversational AI, Summarization, Question Answering, Formulation of Robot Instructions .
First introduced by Google in 2017, Transformers are today the core of revolutionary technologies, such as such as ChatGPT, Google Search, Dall-E, and Microsoft Copilot, overtaking the most commonly employed Deep Learning neural network architectures across various applications, such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs).
In this series of lectures, participants will achieve professional experience with the industrial frameworks and languages for building transformer-based architectures in the domains of sequence and image modeling. The lectures focus on decoder/encoder architectures, fine-tuning techniques, alignment problems, vision models, multimodal models, text-to-image models, and large-scale inference.
Course Contents in brief:
- Introduction
- The transformer architecture and its applications (BERT, GPT, ViT, DeTR, etc.) [1] [2] [3] [4]
- Python and the huggingface framework (transfomers, tokenizers, datasets, diffusers, etc.)
- Decoder-only architectures
- GPT family [3]
- Few Shot Learning [1] [5]
- Encoder-only architectures
- BERT family [4]
- Clustering and Classification
- Encoder-decoder architectures
- T5 familty [6]
- Summarization, transalation, paraphrasing
- LLM Fine Tuning
- Parameter-efficent fine-tuning (PEFT, LoRA family, etc.) [7]
- Quantization methods (GPTQ, GGML, etc.)
- Aligment problem
- ChatGPT [8]
- Instruction Following fine-tuning
- Reinforcement Learing from Human Feedback (RLHF)
- Vision Models
- Vision Transfomers (ViT) [9]
- Detection Transfomers (DeTR) [10]
- Bootstrapping Language-Image Pre-training (BLIP)
- Multimodal Models
- Contrastive Language-Image Pretraining (CLIP) [11]
- Large Language-and-Vision Assistant (LLaVA)
- Visual Question Answering (VQA)
- Document Question Answering (DQA)
- Text to Image Models
- Image Synthesis history (GANs)
- Diffusion family [12]
- Large Scale Inference
- FastAPI
- vLLM
- Text Generation Inference
Schedule:
- 11/03/2024: 10.00 – 13.00, Lecture 1 (3h) - Via Caruso
- 12/03/2024: 15.00 – 18.00, Lecture 2 (3h) - Via Caruso
- 13/03/2024: 15.00 – 18.00, Lecture 3 (3h) - Largo L. Lazzarino
- 14/03/2024: 15.00 – 18.00, Lecture 4 (3h) - Largo L. Lazzarino
- 15/03/2024: 15.00 – 18.00, Lecture 5 (3h) - Via Caruso
- 18/03/2024: 15.00 – 17.00, Lecture 6 (2h) - Via Caruso
- 22/03/2024: 15.00 – 18.00, Lecture 7 (3h) - Via Caruso