Home   Publications     edited volumes   Awards   Research   Teaching   Miscellaneous   Full CV [pdf]   BLOG   bio
  
 
 
  
 
  
  Events
  
  
  
  
   
  
   Past Events
  
  
  
  
  
  
   
    | 
Publications of Torsten Hoefler  
Torsten Hoefler:
 
  |  |   | Characterizing the Influence of System Noise on Large-Scale Parallel Applications
   (Presentation -  presented in Aachen, Germany, Apr. 2011, Talk at RWTH Aachen University ) 
 
 AbstractSystem noise is increasingly a concern as HPC systems continue
  to grow in scale. Good operating systems can minimize noise, however,
  Some sources of asynchronous slowdowns, such as recoverable hardware
  error remain. Existing studies with artificial noise models provide
  only limited insight into application behavior under the influence of
  noise. This paper presents an in-depth analysis of the impact of
  system noise on large-scale parallel application performance in
  realistic settings. Our analytical model shows the particular
  circumstances under which noise is propagated or absorbed. The model
  shows that not only collective operations but also point-to-point
  communications influence the application's sensitivity to noise. We
  present a simulation toolchain that injects noise delays from traces
  gathered on four common large-scale architectures into a LogGPS
  simulation and allows new insights into the scaling of applications in
  noisy environments. Our simulation framework enables large-scale
  simulations up to 8 million processes with more than 1 million events
  per second. We investigate collective operations in noisy settings
  with up to 1 million processes and three applications (Sweep3D, AMG,
  and POP) with up to 32.000 processes. We show that the scale at which
  noise becomes a bottleneck is system-specific and depends on the
  structure of the noise.  Simulations with different network speeds
  show that a 10x faster network does not improve application
  scalability because noise becomes a bottleneck at scale. We quantify
  this noise bottleneck and conclude that our tools can be utilized to
  tune the noise signatures of a specific system for minimal noise
  propagation. For example, our simulations verify the long-standing
  conjecture that co-scheduling prevents significant application
  slowdown.
 
 Documents download slides:      |  |   | BibTeX |  @misc{osnoise-talk-aachen,   author={Torsten Hoefler},   title={{Characterizing the Influence of System Noise on Large-Scale Parallel Applications}},   year={2011},   month={Apr.},   location={Aachen, Germany},   note={Talk at RWTH Aachen University},   source={http://www.unixer.de/~htor/publications/}, } |  
  |  
  
 
 |