Early assessment of the vulnerability of microprocessor components to hardware faults can drive effective protection decisions. Microarchitecture-level simulators are employed for such early assessments and can deliver reliability reports for a large number of hardware structures taking into consideration the masking effects of the entire stack of hardware and software layers. Statistical fault injection at the microarchitecture level is a very accurate approach which, however, may suffer from low throughput if a statistically significant assessment is required.
This tutorial focuses on recent advances delivered by the Computer Architecture Lab of the University of Athens in the area of microarchitecture level reliability assessment using statistical fault injection. We present GeFIN (Gem5-based Fault Injector) a state-of-the-art microarchitecture level fault injection framework built on Gem5 simulator. GeFIN supports massive and fast injection campaigns for all different types of faults (transient, permanent, intermittent) on arbitrary combinations of several dozens of microarchitectural components modeled in Gem5. We first present the baseline Gem5 engine as well as AVF (Architectural Vulnerability Factor) and FIT (Failures in Time) measurements reported by the tool which are reports fine-grained fault effects classifications.
We also present two GeFIN add-ons designed to improve the throughput of the injections campaigns but preserve the accuracy of the reliability measurements. The first add-on is a set of speed-up methods on GeFIN individual runs themselves and the second add-on is MeRLiN a fault classification approach based on dynamic instruction profiling which aims at pruning the number of faults in extremely large fault lists. Both add-ons deliver large throughput improvements (several orders of magnitude) for comprehensive (and thus statistically significant) fault injection campaigns while they preserve the reported AVF measurements.
The tutorial includes measurements for different microarchitectural configurations (corresponding to different CPU models), discussion about ACE analysis and fault injection at the microarchitecture level, discussion about CPU and GPU reliability assessment at the microarchitecture level as well as comparison between microarchitecture-level and register-transfer level fault injection on a commercial CPU model.
Dimitris Gizopoulos (firstname.lastname@example.org) is Professor at the Department of Informatics & Telecommunications of the National & Kapodistrian University of Athens in Greece where he leads the Computer Architecture Laboratory. The group's research focuses on the area of Dependable and Energy-Efficient Computer Architecture, and in particular reliability assessment, fault/error tolerance, design correctness validation, design margins harnessing and their relation to performance and energy-efficiency for microprocessors. Gizopoulos has published more than 150 papers in top-tier conferences and journals, has served as Associate Editor for several IEEE Transactions and Magazines (TC, TVLSI, D&T, TSUSC) and as member of several Program, Organizing and Steering Committees of major IEEE and ACM technical conferences.
Gizopoulos is an IEEE Fellow, a Golden Core member of the IEEE Computer Society and a Senior ACM member.
Athanasios Chatzidimitriou (email@example.com) is a PhD student at the University of Athens working on methods and tools for microarchitecture level reliability assessment as well as energy-efficient computing. He holds a BSc in Computer Engineering and an MSc in Computer Science. He is the lead developer of GeFIN.
Manolis Kaliorakis (firstname.lastname@example.org) is a PhD student at the University of Athens working on methods and tools for microarchitecture level reliability assessment as well as energy-efficient computing. He holds a BSc in Electrical and Computer Engineering and an MSc in Microelectronics. He is the lead developer of MeRLiN.
ISCA 2017 - "MeRLiN: Exploiting Dynamic Instruction Behavior for Fast and Accurate Microarchitecture Level Reliability Assessment", M.Kaliorakis, D.Gizopoulos, R.Canal, A.Gonzalez, ACM/IEEE International Symposium on Computer Architecture (ISCA 2017), Toronto, Canada, June 2017.
DSN 2017 - "RT Level vs. Microarchitecture Level Reliability Assessment: Case Study on ARM Cortex-A9 CPU", A.Chatzidimitriou, M.Kaliorakis, D.Gizopoulos, M.Iacaruso, M.Pipponzi, R.Mariani, S.Di Carlo, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2017), Denver, CO, USA, June 2017.
VTS 2017 - "Performance-Aware Reliability Assessment of Heterogeneous Chips", A.Chatzidimitriou, M.Kaliorakis, S.Tselonis, D.Gizopoulos, IEEE VLSI Test Symposium (VTS 2017), Las Vegas, NV, USA, April, 2017.
ISPASS 2016 - "Anatomy of Microarchitecture-Level Reliability Assessment: Throughput and Accuracy", A.Chatzidimitriou, D.Gizopoulos, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2016), Uppsala, Sweden, April, 2016.
VTS 2016 - "Faults in Data Prefetchers: Performance Degradation and Variability", N. Foutris, A. Chatzidimitriou, D. Gizopoulos, J. Kalamatianos, V. Sridharan, IEEE VLSI Test Symposium (VTS 2016), Las Vegas, NV, USA, April, 2016.
VTS 2016 - "Microprocessor Reliability-Performance Tradeoffs Assessment at the Microarchitecture Level", S.Tselonis, M.Kaliorakis, N.Foutris, G.Papadimitriou, D.Gizopoulos, IEEE VLSI Test Symposium (VTS 2016), Las Vegas, NV, USA, April, 2016.
DFTS 2015 - "Accelerated Microarchitectural Fault Injection-based Reliability Assessment", M.Kaliorakis, S.Tselonis, A.Chatzidimitriou, D.Gizopoulos, IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS 2015), Amherst, MA, USA, October 2015.
IISWC 2015 - "Differential Fault Injection on Microarchitectural Simulators", M.Kaliorakis, S.Tselonis, A.Chatzidimitriou, N.Foutris, D.Gizopoulos, IEEE International Symposium on Workload Characterization (IISWC 2015), Atlanta, GA, USA, October 2015.