Foto 7

Claudio Lucchese, Franco Maria Nardini, Nicola Tonellotto, High Performance Computing Lab - ISTI-CNR, Pisa, "Cloud Computing for Big Data Analysis", 8 - 22 June 2015

Hours:
20 h (5 credits)

Room:
Aula Riunioni del Dipartimento di Ingegneria dell'Informazione, Via G. Caruso 16, Pisa – Ground Floor

Short Abstract:
In this course, we will discuss the characteristics and benefits of cloud computing as the current technological trend to deliver on-demand computing resources over the Internet on a pay-for-use basis, and the Map Reduce programming paradigm, daily used by large IT companies to process huge amounts of data on large-scale distributed platforms, together with the Apache Hadoop framework, its open source de-facto standard implementation. Furthermore, we will present and discuss some problems and solutions for cloud data management systems. To this end, we will introduce the consensus problem in asynchronous distributed platforms, presenting impossibility results of distributed systems theory, and we will discuss algorithms and solutions for data consistency, availability and fault tolerance. Eventually we will present big data analysis techniques, such as clustering, regression and graph analysis, as fundamental tools to model and extract knowledge from data, with a focus on information retrieval problems.

Course Contents in brief:

  • Introduction
  • Concepts and techniques for Cloud computing (2 hours)
    • Cloud characteristics and benefits
    • Designing applications for the Cloud
    • Virtualization mechanisms
  • Cloud Data Management problems and solutions (6 hours)
    • Availability, consistency and fault tolerance: impossibility results
    • Strong consistency: classical solutions
    • Weak consistency: Amazon solutions
  • Programming for Big Data problems (6 hours)
    • MapReduce programming and design patterns
    • Apache Hadoop and PIG frameworks
    • Streaming Data Analysis
  • Big Data Analysis Techniques (6 hours)
    • Clustering and regression
    • Graph analysis
    • Machine learning techniques for information retrieval

Schedule:

  • June, 08, 15:00-17:00
  • June, 09: 15:00-18:00
  • June, 10: 15:00-18:00
  • June, 11: 15:00-18:00
  • June, 12: 15:00-18:00
  • June, 18: 15:00-18:00
  • June, 22: 15:00-18:00