Modifica Aule del Corso: Gianpaolo Coro, CNR, “Signal processing and mining of Big Data: biological data as case study”, 2 - 6 May 2016

20 hours (5 credits)

Day 1, Day 4: Aula Riunioni del Dipartimento di Ingegneria dell'Informazione, Largo Lucio Lazzarino, Pisa
Day 2, Day 3, Day 5: Aula Riunioni del Dipartimento di Ingegneria dell'Informazione, via G. Caruso 16, Pisa – Ground Floor

Short Abstract:
Big Data analytics is gaining large interest in both public and scientific agendas, because it has demonstrated that it is possible to extract valid information from a large amount of noisy data and to produce valuable information for decision makers. Applications of Big Data analytics can be found in a large variety of domains, including economics, physics, healthcare and biology. In this last domain, analytics have been used, for example, to predict climate change impact on species' distribution, to monitor the effect of overfishing on economy and marine biodiversity and to prevent ecosystems collapse.

In this course, practical applications of Big Data analytics will be shown, with focus on several signal processing and machine learning-based techniques. The course will clarify the general concepts behind these techniques, with an educational approach making these concepts accessible also to students with intermediate mathematical skills. The examples will regard real cases involving data that would have been unpractical to be human-analyzed and corrected, especially in the biology domain: time series forecasting, periodicities detection, comparison of geographical distribution maps, assessment of environmental similarities between different areas, global scale species distributions.

The above techniques have a general purpose applicability and the students will be able to use them in other domains too. Cloud computing, data sharing, experiments reproducibility, usage of data representation standards and most of the requirements of Big Data analytics systems will be explained and practiced. To execute the experiments, students will use a distributed e-Infrastructure (D4Science) developed at ISTI-CNR, also used in the European Laboratory on Big Data Analytics and Social Mining (SoBigData). This web-based platform hides the complexity of implementing Big Data analytics processes from scratch and allows students to concentrate on experiments configuration and output evaluation, and to understand models' behaviours. For this reason, the course does not require any programming skill and is suited for students in Computer Engineering, Informatics, Telecommunications engineering, Statistics and Computational Biology.

Course Contents in brief:

  • Cloud and distributed computing
  • Big Data analysis
  • e-Infrastructures
  • Large time series forecasting
  • Automatic periodicities detection
  • Neural Networks
  • Large scale probabilistic GIS maps


from 2 to 6 May

  • Day1 – Introduction and presentation of the tools: the D4Science e-Infrastructure, Cloud and distributed computing for community-provided processes – 9.00 – 13.00
  • Day2 –Features analysis: Clustering, Principal Component Analysis and applications– 9.00 – 13.00
  • Day3 –Large time series analysis: Fourier Transform, Short-Time Fourier Transform, Singular Spectrum Analysis and applications – 9.00 – 13.00
  • Day4 –Large time Series forecasting: Caterpillar Singular Spectrum Analysis and applications– 9.00 – 13.00
  • Day5 –Modeling: Neural Networks, Maximum Entropy, Geographical Distribution Maps and applications– 9.00 – 13.00