November 28, 2013: Transparent Replication as an Operating System Service

  • Speaker: Björn Döbel
  • Title: Transparent Replication as an Operating System Service
  • Abstract: Many existing approaches to building fault-tolerant systems either rely on custom-tailored hardware or use compiler-level techniques to generate fault-tolerant program code. While dedicated hardware solutions are generally applicable to all software, they are usually too is ableexpensive to be used in the mass market. In contrast, compiler techniques require the programs‘ source code to be available, which is an infeasible assumption in many use cases.

    In the context of the L4/Fiasco.OC microkernel we implemented fault tolerance as an operating system service. This service, Romain, transparently replicates execution of binary-only programs running on commercial-off-the-shelf hardware. If the underlying platform can accomodate for the increased resource consumption, Romain  to achieve very low overheads: triple-modular redundant execution of the SPEC INT 2006 benchmarks is achieved with a geometric mean overhead of 2.51%.

    In this talk I am going to introduce Romain and discuss its internals. I will especially explain how ressources are managed in a replicated environment and how Romain deals with problems arising from replicating shared memory accesses as well as multithreaded applications.