Foto 7

Dr. Gianpaolo Coro, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” (ISTI) - CNR- Italy, "Big Data Analytics: Marine Data as a Case Study", 6-9 May 2025

Hours:
16 hours (4 credits)

Room:

Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa - Ground Floor

To register to the course, click here

Short Abstract:

In this course, practical methodologies for marine data analysis and modelling will be presented.

The course will cover specific classes of problems in marine science and their corresponding solutions, adopting state-of-the-art computer science technologies and methodologies. The explained techniques will include:

  1. Unsupervised approaches to discover patterns of habitat change and predict fishing vessel activity patterns: Principal Component Analysis and Maximum Entropy for feature selection; KMeans, XMeans, DBScan, and Local Outlier Factor cluster analysis; Singular Spectrum Analysis for time series forecasting;
  2. Supervised approaches for species distribution prediction and invasive species monitoring: Feed-Forward Artificial Neural Networks, Support Vector Machines, AquaMaps, Maximum Entropy;
  3. Bayesian models to predict fish stock availability in specific fishing areas; 

These methods will be applied to marine data such as vessel transmitted data, species observation records, and catch and vessel time series that fall into the Big Data category. These data are crucial to safeguard food availability and economic welfare, which are fundamental to human life. For example, predicting the impact of climate change on species habitat distribution contributes to avoiding economic and biodiversity collapse due to sudden ecosystem change. Likewise, monitoring the effect of overfishing on fish stocks and marine biodiversity prevents ecosystem and economic collapse.

The explained techniques will address real use cases of the United Nations (FAO, UNESCO, UNEP, and others) for marine food and ecosystem safety and illustrate the new lines of research in this context. They are also general enough to be applied to Big Data of other domains. The analysed data have indeed general characteristics of Big Data such as constantly incrementing volume, vast heterogeneity and complexity, and unreliable content. For this reason, the methodologies will be illustrated in the context of the Open Science paradigm, characterized by the repeatability, reproducibility, and cross-domain reuse of all experimental phases.

The course will be interactive and made up of practical exercises. Attendees will use online environments to parametrize the models, run the experiments, and potentially modify the models.

Course Contents in brief:

  1. Big data and marine data
  2. Geospatial data
  3. Parameter selection techniques for environmental variables
  4. Distance and density-based cluster analysis for habitat and vessel pattern recognition
  5. Artificial Neural Networks, Support Vector Machines, and Maximum Entropy models for species distribution modelling
  6. Techniques for time series forecasting applied to marine data
  7. Open Science approaches

Schedule:

  1. Day1 – May 6, 2025 – h. 9.00-13.00 Introduction to marine data and Open Science methodologies
  2. Day2 – May 7, 2025 – h. 9.00-13.00 Data selection techniques and pattern recognition
  3. Day3 – May 8, 2025 – h. 9.00-13.00 Supervised modelling of species distributions and invasions
  4. Day4 – May 9, 2025 – h. 9.00-13.00 Data mining techniques for extracting knowledge from biodiversity and vessel data