DOTTORATO DI RICERCA IN INGEGNERIA DELL'INFORMAZIONE

Foto 7

Link Generali

Info per...

Login

Prof. E. Veronica BELMEGA, Université Gustave Eiffel (UGE), CNRS, LIGM – France, "Online optimization, reinforcement learning and their applications", 27-30 May 2024

Hours:
16 hours (4 credits)

Room:

Aula Riunioni del Dipartimento di Ingegneria dell’Informazione, Via G. Caruso 16, Pisa - Ground Floor

To register to the course, click here

Short Abstract:

This graduate course is focused on the study of online optimization and reinforcement learning applied to wireless and mobile mobile networks (IoT, 5G). Such networks may vary rapidly over time, potentially in an unpredictable and non-stochastic way because of ad-hoc user connectivity and behavior, and, hence traditional methods based on static (classic) or stochastic optimization and game theory are no longer suited. Instead, online optimization can be exploited to derive efficient algorithms, with theoretical guarantees in terms of no regret, that solve optimization problems where no assumptions can be made on the underlying temporal dynamics governing the network and, hence, the objective function to be optimized.

Course Contents in brief:

Iterative online process based on strictly causal feedback information
Regret measure: definition, no-regret property, intuition, links with static (classic) and stochastic optimal solutions
Link with multi-armed bandits from reinforcement learning (UCB, epsilon-greedy, EXP3 algorithms)
First-order online algorithms: online gradient descent, online mirror descent, and their theoretical guarantees in terms of no-regret and regret decay rates
Applications in wireless communications: beam-alignment in mmWave networks, energy-efficient NOMA power allocation, resource optimisation in IoT networks…
Beyond wireless: online metric learning for multimedia indexing, online matrix completion for movie ratings, universal filtering, etc.
Tradeoff between performance (regret decay) vs. required feedback information:
- Feedback reduction: imperfect gradient feedback (stochastic gradient estimation), zeroth order methods (gradient estimation based on one value of the objective function)
- Second order online descent methods
Lab practice (4 hours): implement and evaluate several multi-armed bandit algorithms to solve an outage minimization problem in a two-user adaptive NOMA system without any CSIT/CDIT, but relying solely on a single bit of feedback