Leveraging Modern Multi-core Processor Features to Efficiently Deal with Silent Errors
Contenido principal del artículo
Resumen
Since current multi-core processors are more com- plex systems on a chip than previous generations, some transient errors may happen, go undetected by the hardware and can potentially corrupt the result of an expensive calculation. Because of that, techniques such as Instruction Level Redundancy or checkpointing are utilized to detect and correct these soft errors; however these mechanisms are highly expensive, adding a lot of resource overhead. Hardware Transactional Memory (HTM) exposes a very convenient and efficient way to revert the state of a core’s cache, which can be utilized as a recovery technique. An experimental prototype has been created that uses such feature to recover the previous state of the calculation when a soft error has been found. The combination of HTM, Hyper-Threading and Memory Protection Extensions may further improve the performance, applicability and confidence of our technique.