Kishor Trivedi, Duke University, “Reliability, Availability and Performance of Data Centers and Clouds”, 14 - 18 March 2016

Hours:
20 hours (5 credits)

Room:
Aula Riunioni del Dipartimento di Ingegneria dell'Informazione, via G. Caruso 16, Pisa – Ground Floor

Short Abstract:
In this short course we will expose methods used in reliability, availability, performability and survivability modeling and analysis of systems in practice. Non-state-space solution methods are often used to solve reliability block diagrams, fault trees and reliability graphs. Relatively efficient algorithms are known to handle systems with hundreds of components and have been implemented in many software packages. We will show the usage of these model types through practical examples and via the software package SHARPE. Nevertheless many practical problems cannot be handled by such algorithms. Bounding algorithms are then used in such cases as was done for a major subsystem of Boeing 787. Non-state-space methods derive their efficiency from the independence assumption that is often violated in practice. State space methods based on Markov chains, stochastic Petri nets, semi-Markov and Markov regenerative processes can be used to capture various kinds of dependencies among system components. Markov models, Markov Reward models and stochastic Petri nets will be illustrated through practical problems and using the SHARPE software package. However, the resulting state space explosion severely restricts the size of the problems that can be solved. Hierarchical and fixed-point iterative methods provide a scalable alternative that combines the strengths of state space and non-state-space methods and have been extensively used to solve real-life problems. The use of hierarchical and fixed point iterative methods will be also illustrated via large system examples and the SHARPE software package.

Course Contents in brief:

  • Reliability and Availability Modeling in Practice
  • Markov Chains and Stochastic Petri Nets in Performance and Reliability Modeling
  • Performance and Reliability of Clouds
  • Software aging and rejuvenation; Software Fault Tolerance via Environmental Diversity

Schedule:

  • Day 1: 8.30-13.30
    • covered subjects: Definitions, Reliability Block Diagrams, Fault Trees, Reliability Graphs with Applications and the use of the SHARPE software package
  • Day 2: 8.30-13.30
    • covered subjects: Markov Chains and Stochastic Petri Nets in Performance and Reliability Modeling with Applications and the use of the SHARPE software package
  • Day 3: 8.30-13.30
    • covered subjects: Performance, Availability, Power Modeling and Optimization of clouds
  • Day 4: 8.30-13.30
    • covered subjects: Software reliability, software aging and rejuvenation; software Fault Tolerance via Environmental Diversity
  • Day 5: 9.00-13.00
    • Final exam