Dott. Giulio Ermanno Pibiri, ISTI-CNR, Pisa, Italy, "Theory & Practice of Data Compression", 11-15 April 2022

20 hours (5 credits)


Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa - Ground Floor

Short Abstract:

The need of storing data in compact form is increasingly important for the ever-growing rate of data produced on a daily basis. To keep up with this data explosion phenomenon, data compression is a mandatory step to deliver good quality of service in concrete applications. In this introductory course you will learn about fundamental data compression algorithms that are all widely adopted in practice by tools that we use every day, like filesystems, computer networks, search engines, databases, and so on. These algorithms have now become indispensable knowledge across many fields in computing, including Information Retrieval, Machine Learning, Natural Language Processing, Applied Physics, and Bioinformatics. To better grasp the beauty behind data compression, we will also learn how to implement some of these algorithms in C++ through several "hands-on" sessions.

Course Contents in brief:

  1. Introduction
    1. What is and Why Data Compression?
    2. Motivations
    3. Technological Limitations: Memories and Hierarchies
    4. Applications
    5. Basic Notions: Entropy, Information-Content, Data-Redundancy, Compression-Ratio
  2. Integer Codes
    1. Basic Notions: Distributions, Kraft-McMillan Inequality
    2. Run-Length Encoding, Gamma, Delta, Golomb, Rice, Zeta, Fibonacci, Variable-Byte
    3. Encoding/Decoding of Prefix-Free Codes
  3. Lab Session 1 on Integer Codes
  4. Sequence Compressors
    1. Basic Notions: Combinatorial Lower Bound
    2. Binary Packing, Simple, PForDelta, Elias-Fano, Interpolative, Directly-Addressable, Hybrid
    3. Inverted Indexes and Social Networks
  5. Lab Session 2 on Sequence Compressors
  6. Statistical Compressors
    1. Shannon-Fano, Huffman, Arithmetic Coding, Asymmetric Numeral Systems
  7. Dictionary-Based Compressors
    1. LZ77, LZ78, LZW, variants: gzip, LZO, Zstd


11-15 April 2022, 14:00 - 18:00