Foto 7

Dr. Gianpaolo Coro - Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” (ISTI) - CNR- Italy - "Big Data Analytics and Signal Processing: Biological Data as a Case Study", 8-12 June 2020

Hours:
20 hours (5 credits)

Room:
Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa – Ground Floor

Short Abstract:
Big Data analytics is gaining large interest in both public and scientific agendas, because it allows to extract valid information from a large amount of noisy data and to produce valuable information for decision makers. Applications of Big Data analytics can be found in a large variety of domains, including economics, physics, healthcare, and biology. For example, analytics has been used in biology to predict the impact of climate change on species’ distribution, to monitor the effect of overfishing on economy and marine biodiversity, and to prevent ecosystems collapse.
In this course, practical applications of Big Data analytics will be shown, with focus on several signal processing and machine learning-based techniques. The course will clarify the general concepts behind these techniques, with an educational approach making these concepts accessible also to students with intermediate mathematical skills. The examples will regard real cases involving data that would have been hardly human-analyzed and corrected, especially in the domain of biology. The explained techniques will include: automatic periodicities detection, time series forecasting, Artificial Neural Networks, Support Vector Machines, Maximum Entropy, Markov Chains Monte Carlo, geographical maps comparison, global scale species distributions, species invasion prediction.
The above techniques have a general purpose applicability and the students will be able to use them in other domains too. Cloud computing, data sharing, experiments reproducibility, usage of data representation standards and most of the requirements of Big Data analytics systems will be explained and practiced in the context of the new Open Science paradigm. In order to practice with the experiments, the students will use a distributed e-Infrastructure (D4Science) developed at ISTI-CNR and used in a number of international projects. This Web-based platform hides the complexity of implementing Big Data analytics processes from scratch and allows students to concentrate on experiments configuration, results evaluation, and models’ behaviour understanding. For this reason, the course does not require any programming skill and is suited for students in Computer Engineering, Informatics, Telecommunications engineering, Mathematics, Statistics, and Computational Biology.

Course Contents in brief:

  • Distributed computing
  • Big Data analytics
  • e-Infrastructures
  • Time series forecasting and periodicities detection
  • Machine Learning-based methods
  • GIS maps

Schedule:

8-12 June 2020

  • Day1 – e-Infrastructures, Cloud and Distributed computing– 9.00 – 13.00
  • Day2 – Dimensionality reduction – 9.00 – 13.00
  • Day3 – Time series analysis and applications – 9.00 – 13.00
  • Day4 – Machine learning-based modelling and applications – 9.00 – 13.00
    Day5 – Tools for Open Science – 9.00 – 13.00