# SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations

SÖREN VAN DER WALL, TU Braunschweig, Germany ROLAND MEYER, TU Braunschweig, Germany

We address the problem of preserving non-interference across compiler transformations *under speculative semantics*. We develop a proof method that ensures the preservation uniformly across all source programs. The basis of our proof method is a new form of simulation relation. It operates over directives that model the attacker's control over the micro-architectural state, and it accounts for the fact that the compiler transformation may change the influence of the micro-architectural state on the execution (and hence the directives). Using our proof method, we show the correctness of dead code elimination. When we tried to prove register allocation correct, we identified a previously unknown weakness that introduces violations to non-interference. We have confirmed the weakness for a mainstream compiler on code from the libsodium cryptographic library. To reclaim security once more, we develop a novel static analysis that operates on a product of source program and register-allocated program. Using the analysis, we present an automated fix to existing register allocation implementations. We prove the correctness of the fixed register allocations with our proof method.

CCS Concepts: • Security and privacy  $\rightarrow$  Formal security models; Side-channel analysis and countermeasures; • Software and its engineering  $\rightarrow$  Compilers.

Additional Key Words and Phrases: speculative execution, compilation, verification, register allocation

## **ACM Reference Format:**

Sören van der Wall and Roland Meyer. 2025. SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations. *Proc. ACM Program. Lang.* 9, POPL, Article 51 (January 2025), 39 pages. https://doi.org/10.1145/3704887

## 1 Introduction

Cryptographic implementations must satisfy two conflicting requirements: They must compute highly performant to be of practical use and be absolutely secure for use in critical systems. Implementations optimize performance with knowledge about underlying micro-architectural hardware features such as memory access patterns that improve cache usage. Security, however, is threatened by side-channel attacks that exploit precisely these hardware features to leak sensitive information [Brumley and Boneh 2005]. To mitigate side-channel attacks, leakage of sensitive data needs to be eliminated. This confronts the programmer with two challenges: First, semantics of source-level languages do not model leaks produced by side-channels [Vu et al. 2021]. And second, even if the source-level code is secure, incautious implementation of compiler optimizations can insert new leakage, rendering efforts to secure the source program useless [Simon et al. 2018; Barthe, Blazy, Grégoire, et al. 2019]. Developers address these problems with coding guidelines such as constant time programming and disabling compiler optimizations. But following the guidelines is non-trivial and overlooked mistakes corrupt the guarantee for security [Al Fardan and Paterson

Authors' Contact Information: Sören van der Wall, TU Braunschweig, Braunschweig, Germany, s.van-der-wall@tu-bs.de; Roland Meyer, TU Braunschweig, Braunschweig, Germany, roland.meyer@tu-bs.de.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

© 2025 Copyright held by the owner/author(s).

ACM 2475-1421/2025/1-ART51

https://doi.org/10.1145/3704887

```
chacha20(..., uint8* sec, uint8 bytes) {
                                                     Code Stack (-SP)
   uint8 buf[8];
                                                            0x48
   stk[0] = bytes;
                                                            0x40
                                                                  buf
   for (i = 0; i < 8; i++)
                                                           Heap
     { buf[i] = sec[i]; }
                                                            0xE0
                                                                          sensitive
  bytes = stk[0];
                                                            0xC8
                                                                   sec
                                                                          data
   if (bytes < 64) {...} ... }
                                                            0xC0
```

Code 1. Spectre-PHT. Registers are orange and memory variables teal. The stack contents are shifted by the stack pointer to appear constant. Framed instructions were inserted by register allocation.

2013]. At the same time, disabling compiler optimizations is dissatisfactory. Formal methods have shown to help with the challenges: The first challenge is overcome with novel leakage semantics that model side-channel leakage within the programming language's semantics [Molnar et al. 2006; Barthe, Betarte, et al. 2014]. For the second, novel proof methods for compilers under leakage semantics provide a guarantee that side-channel security of the source program carries over to the executable [Barthe, Grégoire, and Laporte 2018; Barthe, Blazy, Grégoire, et al. 2019; Barthe, Blazy, Hutin, et al. 2021; Barthe, Grégoire, Laporte, and Priya 2021]. Sadly, the recent discovery of Spectre attacks [Canella et al. 2019; Kocher et al. 2019] again presents a hardware feature that leakage semantics fall short of: *Speculative execution* produces side-channel leakages not captured by leakage semantics. This means both challenges were open again, and the verification community was quick to address the first: The development of speculative execution semantics has already taken place [Cauligi, Disselkoen, Gleissenthall, et al. 2020; Guarnieri, Köpf, Reineke, et al. 2021], and formal tools find speculative side-channel leakages or even prove their absence (cf. Section 7). Provably correct compilation under speculative execution semantics, however, remains an unsolved challenge that we address in this paper. It is the challenge that we address in this paper.

# 1.1 Background

Before we detail our contributions, we position our paper in the field of *formally verified cryptogra*phy and provide background on *Spectre attacks* and *mitigations*. We outline *speculative execution semantics*, *non-interference* as the property that guarantees a program's side-channel security even under speculation, and *provably correct compilation* for leakage semantics without speculation.

Formally verified cryptography. The field of formally verified cryptography aims to provide cryptographic implementations that are secured not only by trust in the developer but in a machine-checkable proof of correctness and security. In order to achieve this goal, implementations are carefully crafted and three main areas of research pursue different subgoals [Barbosa et al. 2021]:

(i) Cryptographic protocol design aims to provide proofs that the cryptographic protocol in itself does not reveal secrets to adversarial protocol participants, among other properties. (ii) Correct and performant implementation of the protocol aims to prove the implementation functionally correct. (iii) Implementation security investigates the compilation of implementations and the execution of binaries on real hardware in order to prove the absence of attacks that stem from the discrepancy between idealized program semantics and actual hardware semantics. This paper belongs to (iii): We assume that source programs correctly implement formally verified protocols, i.e. steps (i) and (ii) are completed. We investigate whether compiler transformations preserve side-channel security. To that end, our formal semantics models side-channel leakages and speculative execution, the micro-architectural hardware components that enable the recently discovered Spectre attacks.

Spectre attacks and mitigations. Spectre attacks observe side-channel leakages that are produced during speculative execution. Speculative execution allows the processor to speculatively execute instructions from the pipeline even though they still have unevaluated instruction parameters. When the processor detects a misspeculation, i.e. it assumed incorrect values for the unevaluated parameters, it rolls back execution to erase its effect. Rollbacks are invisible to typical source-level semantics, but parts of the micro-architecture such as the cache-state are not reverted. This creates side-channel leakage observable to an attacker during the speculative execution of instructions. Spectre attacks target this in the following way: (i) Train some micro-architectural component to speculatively execute a code fragment that (ii) under misspeculation brings sensitive data into a processor's register, which (iii) is leaked through side-channels. The prominent example is Spectre-PHT [Kocher et al. 2019], whose source of speculation is the processor's branch prediction unit (Prediction History Table), and the side-channel leakage happens via the cache [Yarom and Falkner 2014; Liu et al. 2015] or the program counter [Molnar et al. 2006]. Code 1 demonstrates the attack: The code is intended to load an 8-byte chunk from sec, which points into a stream of sensitive data, and to store it into a stack-local buffer buf in order to later perform computation on it. (i) The attacker might train the branch predictor in a way that it speculates the for-conditional in Line 4 and executes Line 5 an additional time even though i = 8. Line 5 then stores sensitive data from &sec + 8, say v, to &buf + 8 which aliases with &buf + 8 = &stk. The speculative execution might continue with Line 7, where (ii) v is loaded into bytes. (iii) The register is then used in Line 8, where the branching condition is leaked, disclosing to the attacker whether v < 64.

The de facto approach to avoid Spectre attacks are hardware and software mitigations. In hardware, a simple mitigation is to disable specific speculation sources using control registers. This penalizes performance as it disables the optimization for the whole program, even when other parts of the program do not operate on sensitive data. Software mitigations have received more attention, especially for branch prediction (Spectre-PHT, cf. Section 7, Tools), because disabling branch prediction has severe impact on performance [Vassena et al. 2021, Evaluation]. Spectre-PHT has two known software mitigations: Speculation fence insertion and speculative load hardening [Zhang et al. 2023; Carruth 2024]. Speculation fences sfence instruct the processor to stop speculation and wait until all instruction's unevaluated parameters are resolved before either continuing computation in case of a correct prediction or rolling back in case of a misprediction. This prevents instructions following sfence to be executed speculatively altogether.<sup>2</sup> The mitigation is applicable to all known kinds of speculation sources. Speculative load hardening slh a is a mitigation unique to branch prediction. Executing slh a wipes the contents of register a in case of a branch misprediction, but does not stop speculative execution. In case of correct prediction or non-speculative execution, it leaves the register contents unchanged. In the binary, this semantics is achieved by tying the contents of a to a previous branching condition cond via a data dependency (in the sense of a := cond ? a := 0). This forces the processor to evaluate cond before assigning a value to a. The processor is not guaranteed to stop speculation immediately upon learning the correct value for cond, but the value in a is now safe to be leaked. For other speculation sources (Section 7, Speculation Sources), speculative load hardening does not work because no similar data dependency is known. The Spectre attack from Code 1 is mitigated by inserting either sfence or slh bytes between Line 7 and 8.

 $<sup>^{1}</sup>$ The attack on this code is unlikely to execute on actual hardware because the specific speculation patterns would be hard to train. We chose it because it also demonstrates a new vulnerability in register allocation that we present in this paper.  $^{2}$ This is idealized: In ×86, for example, the instruction is realized with a memory fence LFENCE, which only executes after all loaded parameters to instructions are resolved - stopping the so far known speculation sources.

<sup>&</sup>lt;sup>3</sup>In x86, a cmov instruction is used which does not introduce control-flow branching, so branch prediction will not speculate.

Speculative execution semantics and Non-interference. Speculative execution semantics extend leakage semantics by speculation. The achievement of leakage semantics is to incorporate a model leakage observable to the attacker in the semantics. The observable side-channel leakage depends on the leakage model. Common is the constant-time model which exposes the addresses of memory accesses and the program counter as leakage to the attacker [Barthe, Betarte, et al. 2014; Guarnieri, Köpf, Reineke, et al. 2021]. Transitions in leakage semantics (without speculation) are of the form  $s \to t$ . They prompt a transition from  $s \to t$  while capturing attacker-visible effects on the microarchitectural state in the *leakage observation*  $\lambda$ . Side-channel security can now be formulated as a property on the program's leakage semantics. For that, a relation declares initial states as indistinguishable to the attacker when they differ only w.r.t. sensitive data unknown to the attacker. The property is non-interference, which requires that the executions from indistinguishable initial states produce equal leakages. Non-interference guarantees side-channel security: leakages cannot depend on sensitive data in any way. Without speculation, the running example Code 1 satisfies non-interference under the constant-time leakage model: The control flow is not dependent on the secret sec and the addresses of memory accesses (Lines 3, 5, and 7) are independent as well.

The extension to speculative execution semantics came with a new challenge: Non-determinism. Whether the processor mispredicts and when it detects misprediction is highly hardware dependent and potentially even under the attacker's influence. As a result, there is not a single execution but instead a set of possible executions, each with a different sequence of leakages. A transition in speculative execution semantics is of the form  $s \xrightarrow{\delta:\lambda} t$ . Again,  $\lambda$  is the attacker-observable leakage. What is new is the *directive*  $\delta$  that models the attacker's control over speculation [Cauligi, Disselkoen, Gleissenthall, et al. 2020; Barthe, Cauligi, et al. 2021]. The directives determine the program's speculation behavior. They provide an abstraction of the micro-architecture that the attacker can use to steer the execution whenever it depends on the micro-architectural state. In our example, the attacker steers speculation with the following sequence of directives in order to lead execution to the leakage of sensitive data:

$$\bullet \qquad . \qquad \left( \text{BR} . \bullet \right)^8 \qquad . \qquad \text{SP} \qquad . \qquad \text{SU stk 0} \qquad . \qquad \text{BR} . \bullet . \bullet \qquad . \\ \text{Line 3} \qquad \qquad \text{Line 4} \qquad \qquad \text{Line 5} \qquad \qquad \text{Lines 4, 7, and 8}$$

The intuition is the following. The first instruction is a memory access which cannot be influenced by the attacker, denoted by the directive •. Then, the attacker steers execution so that the correct branch is taken 8 times: Directive BR executes the correct branch and • executes the memory assignment inside the loop. The attacker then chooses to begin a misspeculation with directive SP which enters the loop once more. This leads to an unsafe memory access during the additional loop iteration, where we let the attacker choose actual memory location with SU Stk 0. The remaining sequence leads the execution to the leaking instruction.

In order to phrase non-interference on speculative execution semantics, the idea is to compare executions where the sequence of directives along the executions are equal. Then, the attacker trained the hardware in the same way and can be sure that observed differences in leakage are due to sensitive data. We define our speculative execution semantics (Section 2) and non-interference property (Section 3) in this spirit.

Secure compilation. The goal of secure compilation is to prove that a compiler pass preserves non-interference from source to target program. The current methods for leakage semantics without speculation draw from classical methods for compiler correctness which utilize *simulation* in order to argue that the target program's executions can be found in the source program's semantics [McCarthy and Painter 1967; Leroy 2009]. A traditional simulation is a relation  $\prec$  between the target program's states and the source program's states. Whenever a target state t is simulated by a source state s,  $t \prec s$ , and has a transition  $t \stackrel{o}{\longrightarrow} v$ , where o is an observable environment





Fig. 1. Constant-time cubes. Left: leakage transforming; Right: directive transforming (◄).

interaction, then s has to have a next transition  $s \xrightarrow{o} u$  so that  $v \prec u$ . Simulation ensures that the target program's execution produces the same observable environment interactions as the source program. For leakage semantics, a notion of simulation needs more: Compilers aim to preserve observable environment interactions such as system-calls, but they regularly modify side-channel leakage which creates a difference in leakage between source and target program. Leakage transformation [Costanzo et al. 2016; Barthe, Grégoire, and Laporte 2018; Barthe, Grégoire, Laporte, and Priya 2021] solves this issue: Given the leakages along a source program's execution, the simulation also provides a way to transform the observable leakage into the observable leakage of the corresponding target program's execution. In order to preserve non-interference, a simulation with leakage transformation needs to satisfy the constant-time cube diagram. The cube diagram can be seen in Figure 1 on the left. It looks at two pairs of states related by simulation  $t_1 \prec s_1$  and  $t_2 \prec s_2$ . Then, if  $s_1$ 's next transitions leak a sequence k and so do  $s_2$ 's next transitions (black) then the next transitions' leakage from  $t_1$  and  $t_2$  must coincide as well (purple). The target leakage l does not need to be equal to the source leakage k. Table 1 (left) lists compilers employing simulations with leakage transformation (no speculative semantics) that satisfy the constant-time cube diagram.

In the speculative execution setting, compilers so far try to avoid Spectre attacks by running compiler passes that insert the mitigations discussed above. These passes are among the last passes in the compiler chain in order to avoid the removal of mitigations by other passes. They aim to eliminate speculative leakage by inserting mitigations conservatively, which entails significant performance overhead. Even worse: Efforts to improve performance were flawed, again leading to insecure executables [Patrignani and Guarnieri 2021]. Passes that insert mitigations are desirable because they free the developer from having to think about speculation: Before the compiler runs the mitigation pass, the semantics can be considered speculation-free. However, recent research suggests that in order to obtain minimal performance overhead, the developer needs an interface to control inserted mitigations [Shivakumar et al. 2023]. This means a new proof method for compilation is needed that works when both source and target program operate under speculative execution semantics.

To the best of our knowledge, Patrignani and Guarnieri [2021] is the only work so far to (dis-)prove compiler correctness under speculative execution semantics. The authors target specifically compiler passes that insert mitigations and discovered the aforementioned flaws in fence insertion and speculative load hardening. Being tailored towards mitigations, they employ assumptions on the setting that do not hold in general and that we overcome in our development. We detail the differences to our work in Table 1 (right): **Property:** The first difference lies in the property ensuring side-channel security. While speculative non-interference preservation (SNiP) is the goal,

Table 1. Left: Compilers that preserve side-channel security under leakage semantics; Compcert [Leroy 2009], Jasmin [Barthe, Grégoire, Laporte, and Priya 2021]. Right: Proof methods for compilers with speculative execution semantics; Ex. Spectres [Patrignani and Guarnieri 2021]. Green parameters are more expressive.

|               | Compcert | Jasmin | Ex. Spectres    | This Paper |
|---------------|----------|--------|-----------------|------------|
| Property      | NiP      | NiP    | STsP            | SNiP       |
| Simulation    | LT-Sim   | LT-Sim | LO-Sim          | DT-Sim     |
| Speculation   | _        | _      | SW, M, TO       | US, ST     |
| Non-Det       | No       | No     | No              | Yes        |
| Spec Source   | _        | _      | PHT             | PHT        |
| Memory Safety | S        | S      | U               | S          |
| Passes        | Full     | Full   | SLH & Fence Ins | DC & RA    |

their proof method preserves speculative taint safety (STsP). Taint safety is a safety property that soundly approximates non-interference. While STsP is not an approximation of SNiP, the method is appropriate for analyzing mitigation passes. Our method is designed to prove SNiP, instead. Speculation and Non-Det: A bigger difference is the speculative execution semantics: They assume a speculation window (SW) that limits the number of steps speculatively executed after a misprediction before a rollback occurs. Speculation windows are a restriction of the speculative execution semantics. While the restriction is reasonably chosen with respect to current hardware, it presents an under-approximation of the speculative execution semantics. They further assume that the semantics always mispredicts branches (M) to maximize speculative execution. Mispredict semantics are no further restriction of the semantics, as maximizing speculative execution also maximizes the side-channel leakages produced. Together, these assumptions form a deterministic restriction of the full speculative execution semantics. The focus on compiler mitigations also led them to the assumption that the source language is speculation-free, meaning the speculative semantics are target program only (TO). In this paper, we deal with full, unbounded speculative execution semantics (US) and the induced non-determinism in both source and target semantics (ST). Memory: Our work is presented for structured memory (S) and the assumption that source programs are memory safe when executed under speculation-free semantics. Memory safety is a common assumption for compilers, as unsafe memory accesses are usually considered undefined behavior in source semantics. Our proof method also works with unstructured memory (U), but we also present a static analysis whose presentation immensely benefits from structured memory. This led us to present all of our work with structured memory as the concepts behind the proof method stay the same. Simulation: Mitigations insert speculation barriers and do not change the code otherwise. As the source program in Patrignani and Guarnieri [2021] does not speculate, the leakages of the source program will still be fully present and unchanged in the target program. Because the target program is executed with speculative execution semantics, there can, however, be additional leakages present in the target program. This leads their work to consider leave-out simulations (LO-Sim), where source leakages are equal to target leakages with additional speculative leakages. For general compiler transformations and unbounded speculation, we introduce the more general directive transforming simulations (DT-Sim). Passes: Their work targets the compiler passes that insert the software mitigations against spectre from above: Speculative load hardening and fence insertion. Our work targets two general-purpose compiler transformations: Dead code elimination (DC) and transformations from the register allocation phase (RA).

### 1.2 Contributions

In this paper, we present snippy simulations, a novel proof method for preservation of noninterference under speculative execution semantics. The main challenge to overcome with speculation is the non-determinism in directive semantics. First, the definition of simulations becomes more involved: Deterministic semantics have the advantage that simulations are always bi-simulations [Milner 1971]. A simulation for deterministic semantics synchronizes the (singular) execution of the source program and the execution of the target program. For non-deterministic semantics such as speculative execution semantics, each of the target program's executions must be synchronized with a source program's execution. Second, similar to how compilers do not preserve leakage, they also modify where the attacker can steer the computation: A sequence of directives to steer execution on the source program may be unfit to steer any execution on the target program (the compiler may change instructions and with them the available directives to steer execution also change). We address this issue by introducing the new concept of directive transformations. Directive transformations match every executable sequence of directives in the target program with a sequence of directives executable in the source program. We then embed directive transformations into a new constant-time cube for speculative execution semantics (Figure 1, right). It is our contribution to make the constant-time cube applicable for speculative semantics.

We demonstrate our proof method on two compiler transformations: Dead code elimination and register allocation. This is the first time that these compiler passes have been formally analyzed under speculative execution semantics and to our surprise, we found a serious vulnerability in the transformations performed during the register allocation phase. The register allocation phase is located in the compiler chain where a hardware-independent intermediate representation is replaced by a concrete ISA. It transforms virtual registers into hardware registers and has to spill excess registers to the stack: In Code 1, the framed instructions constitute a spill of the register bytes. The program before register allocation (without Lines 3 and 7) has no side-channel leakage of sensitive data under speculative execution semantics. The program after register allocation (with Lines 3 and 7) is vulnerable to the Spectre attack presented above. This vulnerability is not unique to a singular register allocator, but more generally stems from the spilling transformation performed in this phase. In order to fix the transformations performed, we present a novel static analysis on a product of source program (before register allocation) and target program (after register allocation) that finds problematic speculative leakages introduced by spilling transformations. We then fix the problematic transformations by inserting as few mitigations as possible. The fix is automated and applies to every existing register allocator. We then show that the fixed transformations are secure by once more applying our proof method.

In short, we address the problem of non-interference preservation for compiler passes under speculative execution semantics. We make the following contributions:

- ▶ We develop a proof method for non-interference preservation *under speculative execution semantics* based on simulation relations. Technically, we address non-determinism from speculation with *directive transformations*.
- ▶ We *demonstrate* our proof method on dead code elimination.
- ► We show that *register allocation does not preserve non-interference* under speculative execution semantics. We confirm this for all register allocators of the LLVM compiler on code from the widely used libsodium cryptographic library.
- ▶ We propose a static analysis that *finds and automatically fixes* the vulnerabilities introduced by any register allocator. We apply our proof method to show that the fixed transformation preserves speculative non-interference.

Outline. Section 2 introduces our formulation of speculative execution semantics with leakages and directives. Section 3 defines speculative non-interference preservation (SNiP). Our proof method is presented in Section 4, and we apply it in Section 5 to prove that dead code elimination preserves non-interference. We then analyze the vulnerability we found in the register allocation transformations and present our fix in Section 6. We finish with related works in Section 7 and discuss future prospects in Section 8.

# 2 Language Model

We introduce our programming language and its speculative execution semantics. A program is a mapping  $P: PC_P \to Inst$  from program counters to instructions. The initial program counter is init  $\in PC_P$ . Instructions  $i \in Inst$  are of the following form. We denote registers by  $a, b, c, d \in Reg$  and memory variables by  $\mathbf{a}, \mathbf{b} \in Var$ . The subscripts  $\mathbf{sc} \in PC_P$  are the successors of i. We may also call  $\mathbf{sc}$  a successor of  $\mathbf{pc}$  constant, if it is a successor of  $\mathbf{pc}$  pc.

```
i \in \mathit{Inst} \ ::= \ \mathsf{ret} \ | \ \mathsf{nop}_{\mathsf{ssc}} \ | \ a := b \oplus c_{\mathsf{ssc}} \ | \ a := \mathbf{a}[b]_{\mathsf{ssc}} \ | \ \mathbf{a}[b] := c_{\mathsf{ssc}} \ | \ \mathsf{br} \ b_{\mathsf{ssc}_\mathsf{t}, \mathsf{scf}} \ | \ \mathsf{sfence}_{\mathsf{ssc}} \ | \ \mathsf{s1h} \ a_{\mathsf{ssc}} \ | \ \mathsf{s1h} \ a_{\mathsf{s1h}} \ | \ \mathsf{s1h} \ a_{\mathsf{s1h}} \ | \ \mathsf{s1h} \ a_{\mathsf{s1h}} \ | \ \mathsf{s1h} \ | \ \mathsf{s1h} \ a_{\mathsf{s1h}} \ | \ \mathsf{s1h} \ |
```

The instructions are return (or exit), no-op, assignment, load, store, conditional branching, and the software mitigations for Spectre, speculation fences and speculative load hardening.

Semantics. We introduce two semantics: Speculation-free  $s \xrightarrow{\delta:\lambda} t$  (Rules 2.1) and speculative  $s \xrightarrow{\delta:\lambda} T$  (Rules 2.2). The transitions are labelled by leakage  $\lambda \in Leak$  and directives  $\delta \in Direct$ . Our leakages stem from the constant-time leakage model which leaks the addresses of memory accesses as well as branching conditions. Directives resolve non-determinism for the speculative semantics, i.e. when speculation starts and ends or where unsafe memory accesses (out-of-bounds) actually access memory. Directives are considered under the attacker's control.

A speculation-free state is a tuple  $(pc, \rho, \mu) \in State$  that tracks the program counter  $pc \in PC$ , register contents  $\rho : Reg \to Val$ , and memory  $\mu : Mem \to Val$ . The semantics is given in Rules 2.1 and is fairly standard. Memory is structured and without dynamic allocation. Each variable **a** has static size and for an offset address  $n \in Adr \subseteq Val$ , we write  $n \in |\mathbf{a}|$  to indicate that n lies within  $\mathbf{a}$ 's size. The memory is  $Mem = \{(\mathbf{a}, \mathbf{n}) \mid \mathbf{n} \in |\mathbf{a}|\}$ . Following the leakage model, loads (LOAD and LOAD-UNSAFE) and stores (STORE and STORE-UNSAFE) leak the accessed address used via LD  $\mathbf{n}$  and ST  $\mathbf{n}$ , and branching (BRANCH) leaks its condition with BR  $\mathbf{b}$ . The directives LU  $\mathbf{b}$   $\mathbf{m}$  (LOAD-UNSAFE) and SU  $\mathbf{b}$   $\mathbf{m}$  (STORE-UNSAFE) let the attacker control the address for unsafe memory accesses.

The source of speculation are  $br\ b$  instructions triggering branch-prediction (PHT). Our semantics models speculation only for misspeculated branches. The reason for this is that a correctly predicted branch will later commit and the resulting architectural state and the observable leakages will coincide with an execution that did not speculate in the first place. With no difference in correct speculation and speculation-free execution there is no need to model correctly speculated branches separately, and we will use the terms speculation and misspeculation interchangeably. Speculation thus starts with a branch misprediction, and later ends with a rollback to the state before speculation. A speculating state tracks all active mispredictions in a stack of states  $s, r, \ldots \in s$  state s. The semantics  $s \xrightarrow{\delta:\lambda} T$  is provided in Rules 2.2. With s0, we access s1 instruction s1 pc, when s2 (pc, s2, s3). The mitigation instructions sfence and s3 are speculation sensitive, as their semantics depend on whether the current state is speculating. Their semantics is according to our explanation in Section 1: A speculation fence sfence disallows speculation, so sfence only executes in states currently not speculating. SLH performs speculative load hardening s3 h s4, which wipes a register only if the state is currently speculating. The remaining instructions

<sup>&</sup>lt;sup>4</sup>This means our semantics allows for another speculation immediately after rollback. This could be avoided with an additional flag to store whether a state has already been mispredicted.

RULES 2.1: SPECULATION-FREE SEMANTICS

$$\begin{array}{c}
NOP \\
P \text{ pc} = \text{nop,}_{sc} \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \bullet} (sc, \rho, \mu)
\end{array}$$

$$\begin{array}{c}
ASGN \\
P \text{ pc} = a := b \oplus c,_{sc} \quad v = \rho b \oplus \rho c \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \bullet} (sc, \rho, \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad v = \rho b \oplus \rho c \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \bullet} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STORE}} (sc, \rho, \mu [(a, n) \mapsto v])
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)
\end{array}$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho [a \mapsto v], \mu)$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho, \mu)$$

$$\begin{array}{c}
DOAD \\
P \text{ pc} = a := a [b],_{sc} \quad n = \rho b \in |a| \quad v = \mu a n \\
\hline
(pc, \rho, \mu) \xrightarrow{\bullet : \text{STD}} (sc, \rho, \mu) \xrightarrow{\bullet : \text{STD}}$$

are *speculation insensitive*. STEP executes them on the currently speculating state, i.e. the top-most state in the stack of states. The directives that determine whether a misprediction happens or not are SP and BR. SP demands misprediction performed by SPEC. A copy of the current state is pushed on top of the current state and the program counter is set to the incorrect branch. Otherwise, BRANCH executes on directive BR for a correct branching. ROLLBACK rolls back execution to before the last misprediction. It can be triggered with a RB directive in any state that is currently speculating. There is no bound on the length of a speculation.

We write  $S \xrightarrow{d:l}^* T$  for finite executions and  $S \xrightarrow{d:l} \infty$  for diverging executions. We use l and d for both finite and infinite sequences, i.e.  $l \in Leak^* \cup Leak^\infty$  and  $d \in Direct^* \cup Direct^\infty$ . We call any (init,  $\rho$ ,  $\mu$ ) initial and (pc,  $\rho$ ,  $\mu$ ) with P pc = ret final. The behavior of a program consists of the directives and events along any execution from an initial state. The speculation-free semantics is deterministic, so its behavior is a single execution; the speculative behavior is non-deterministic and its behavior forms a set of executions.

$$\mathsf{Beh}\,P\,s\ \triangleq\ \begin{cases} (d\!:\!l) & s\xrightarrow{d\!:\!l}^*t,t \text{ final} \\ (d\!:\!l) & s\xrightarrow{d\!:\!l}^*\infty \end{cases} \qquad \mathsf{SBeh}\,P\,S\ \triangleq\ \begin{cases} (d\!:\!l)\mid S\xrightarrow{d\!:\!l}^*T,T \text{ final} \rbrace \\ \cup \{(d\!:\!l)\mid S\xrightarrow{d\!:\!l}^*\infty \} \end{cases}$$

We call P safe if no memory access is unsafe, i.e. for every initial state s no directives Lu a n, su a n occur in the speculation-free behavior Beh P s. For the remaining paper we assume safe programs. Note that this does not mean that speculating memory accesses need to be safe. As seen in Code 1, Spectre Attacks utilize the fact that safe programs are not safe under speculative semantics.

Code 2 contains a simplified version of Code 1. The secret is already in a register  $\rho$  secret = v and to be stored to buf at offset b. Assume the offset is out of bounds,  $\rho b = 8 \notin |buf|$ . From a state  $s = (2, \rho, \mu)$ , where the first instruction was already executed, i.e.  $\rho = f \neq 0$ , the following transitions are available: First, a speculation is started with SPEC and s is copied with program counter set to the incorrect branch 3. Next, Store-unsafe executes on directive su stk 0,  $\mu' = \mu[(stk, 0) \mapsto v]$ . The LOAD then brings

```
a = (b < buf_size)
  br (a)_{3,4}
      buf[b] = secret
  bytes = stk[0]
  br (bytes)_{6.6}
  ret
```

Code 2. Simplified Code 1

the secret to a register,  $\rho' = \rho$  [bytes  $\mapsto$  v]. Finally, Branch leaks whether the secret is 0,

```
s \xrightarrow{\text{SP:BR} f} s.(3, \rho, \mu) \xrightarrow{\text{SU} \text{ stk } 0:\text{ST } 8} s.(4, \rho, \mu') \xrightarrow{\bullet: \text{LD } 0} s.(5, \rho', \mu') \xrightarrow{\text{BR:BR} \text{ v=0}} s.(6, \rho', \mu').
```

Speculative semantics exhibit two important properties: First, due to the constant-time leakage model, speculative semantics reveal the program counter to the attacker: The program counter can be deduced from the leakage of conditionals in rules Branch and Spec. Second, directives resolve all non-determinism introduced by speculation. To express the first property, we write  $S \equiv T$  to mean that S and T are at the same program point. For speculation-free states,  $s \equiv pc$  means that s is at program counter pc,  $s = (pc, \rho, \mu)$ . Then,  $s \equiv t$  means that s and t share the program counter,  $s \equiv pc \equiv t$ . For speculating states, we write  $S \equiv T$  if each pair of configurations in their speculation stack is at the same program point. Formally,  $\varepsilon \equiv \varepsilon$ , and  $S.s \equiv T.t$  if  $S \equiv T$  and  $s \equiv t$ . The following two lemmas express the properties.

Lemma 1 (Program-Counter-Leakage). If two same-point states  $S_1 \equiv S_2$  execute with the same directives and leakages,  $S_1 \stackrel{d:l}{\longrightarrow} T_1$ ,  $S_2 \stackrel{d:l}{\longrightarrow} T_2$ , then the resulting states are also same-point,  $T_1 \equiv T_2$ .

Lemma 2 (Directive-Determinism). For all S and  $\delta$  there exist at most one T and  $\lambda$  with  $S \xrightarrow{\delta : \lambda} T$ .

*Notation.* Similar to how we access the current instruction with P s, we write f s instead of f pc when  $s = (pc, \rho, \mu)$  for any  $f : PC \to A$ . Further, we extend not only the  $\equiv$ -relation to speculating states, but any relation  $R \subseteq State \times State$  is extended to a relation on SState in the obvious way:  $\varepsilon$  R  $\varepsilon$  and S.s R T.t, if S R T and s R t.

## **Non-intereference Properties**

We define non-interference and non-interference preservation for our speculative semantics. We require the initial state's memory to be partitioned into public and sensitive data through a security level assignment sec:  $Var \rightarrow SecLvl$  to a lattice  $SecLvl = (\{L, H\}, L \leq H)$ . Sensitive data (H) is considered unknown to the attacker, and we say that initial states are indistinguishable to the attacker, (init,  $\rho$ ,  $\mu$ ) =<sub>sec</sub> (init,  $\rho$ ,  $\mu'$ ), when the memory coincides on all variables considered public. That is, for all  $\mathbf{a} \in Var$  with  $\sec \mathbf{a} = \mathsf{L}$ ,  $\mu \mathbf{a} = \mu' \mathbf{a}$ .

Our formulation of speculative non-interference (SNi, Definition 1) requires indistinguishable initial states  $s_1 = s_{\text{sec}} s_2$  to produce equal behavior. That means that for both initial states (i) the sequences of executable directives are the same, and (ii) for each sequence of executable directives, the observable leakage is the same. Let us explain the necessity for the first condition. As long as the second condition is satisfied, any two executions  $s_1 \xrightarrow{d:l} T_1$  and  $s_2 \xrightarrow{d:l} T_2$  will stay in the same program point  $T_1 \equiv T_2$  (Lemma 1) and synchronously execute the same instructions. If at  $T_1$ the set of executable directives is different to those in  $T_2$ , then the instruction has to be a memory access. All other instructions have the same set of executable directives, independent of register and memory contents. This means that one state executes an unsafe memory accesses (LOAD-UNSAFE or STORE-UNSAFE) while the other executes a safe memory accesses (LOAD or STORE). However, unsafe and safe memory accesses both leak the address used. A difference in executable directive thus amounts to different leakage.

DEFINITION 1 (SNi). A program is speculatively non-interferent,  $P \models SNi$ , if all indistinguishable initial states  $S_1 =_{sec} S_2$  have the same behavior SBeh  $P S_1 = SBeh P S_2$ .

Our goal is to prove preservation of non-interference for compiler transformations. We model compiler transformations [.] that map a source program P to the transformed target program [P]. Transformations may modify the structure of initial states from source program P to target program [P]. For example, a pass that realizes the architecture's calling convention relocates function parameters to specific registers. We require each pass to come with a relation  $\prec$  on initial states that identifies the initial states of [P] with the initial states of P. In order to define preservation of non-interference, the relation has to respect sec in the following sense:

DEFINITION 2. A relation  $\prec \subseteq SState_{[P]} \times SState_P$  respects sec if every initial  $t \in SState_{[P]}$  is mapped to an initial  $s \in SState_P$  with  $t \prec s$ , and for all pairs of initial states  $t_1 \prec s_1$  and  $t_2 \prec s_2$ :  $t_1 =_{sec} t_2$  if and only if  $s_1 =_{sec} s_2$ .

Speculative non-interference preservation for a transformation [.] asks whether for all source programs P,  $P \models SNi$  entails  $[P] \models SNi$ . However, defining preservation in this way leads to potentially surprising outcomes. Even if the source program P fails to be SNi, it can have some indistinguishable initial states which produce equal leakage. One would expect that a speculative non-interference preserving compiler transformation preserves this equal leakage to the target program [P]. But the above definition gives no such guarantee: If P fails to be SNi, there are no guarantees for [P] at all. To counteract that, our definition of speculative non-interference preservation is more precise [Patrignani and Guarnieri 2021]. It requires preservation of equal leakage for every pair of source program's and target program's initial states individually. In particular, this definition entails that if  $P \models SNi$  then also  $[P] \models SNi$ .

DEFINITION 3 (SNiP). A program translation [.] with sec-respecting mapping  $\prec$  is SNi-preserving, [.]  $\models$  SNiP, if all initial states  $t_1 =_{\text{sec}} t_2$  of [P] with initial source states  $t_1 \prec s_1$  and  $t_2 \prec s_2$  of equal behavior SBeh  $P s_1 = \text{SBeh } P s_2$  also have equal target behavior, SBeh [P]  $t_1 = \text{SBeh } [P] t_2$ .

# 4 Proving Speculative Non-interference Preservation

We present our proof method for speculative non-interference preservation. We introduce *snippy simulations* which ensure that a code transformation preserves speculative non-interference:

THEOREM 2. If for all P there is a snippy simulation  $(\prec, \lhd)$  between [P] and P, then [.]  $\models$  SNiP.

In order to reach that goal, we first define simulations that transform directives to cope with the fact that compilers do not preserve executable sequences of directives. We then introduce the constraints a snippy simulation needs to additionally satisfy and finally prove Theorem 2. This reduces proving a transformation SNiP to proving that it has a snippy simulation for each program P. In Section 5, we show how to craft a snippy simulation that is parametric in P, reducing proof effort to a once-and-for-all proof.

## 4.1 Simulation with Directive Transformation

The new feature in our work is *directive transformation*. Conceptually, a simulation between the target program [P] and the source program P shall replay any execution of [P] in P. A directive sequence d selects a single execution in [P] (Lemma 2). Our simulation wants to select a corresponding execution in P. However, the directives d may not be executable in P, or it might select an inappropriate execution. Instead, a different sequence of directives may be necessary on the source program, since transformations [.] are not designed to preserve them.

<sup>&</sup>lt;sup>5</sup>One could also have a second security assignment on the target program, but for simplicity we assume they are the same.

Code 3. Example code transformation from dead code elimination.

Example 3. We accompany our formal development with the example transformation in Code 3, where an unnecessary  $\mathbf{a} := \mathsf{buf}[\mathbf{i}]$  instruction is replaced by a nop. Consider an initial target state  $t = (\mathbf{a}, \rho, \mu)$  where  $\rho \ \mathbf{i} \notin |\mathsf{buf}|$ , and the source state  $s = (1, \rho, \mu)$ . The directives  $\mathsf{SP}.\bullet.\bullet$  are executable from t. But the same sequence cannot be executed from s: An unsafe load necessitates a directive  $\mathsf{LU}\ \mathbf{a}\ \mathsf{m}$  for any  $\mathbf{a}$  and  $\mathsf{m}$ . Thus, a transformed sequence of directives  $\mathsf{SP}.\mathsf{LU}\ \mathbf{a}\ \mathsf{m}.\bullet$  is executed.

A simulation with directive transformation (dt-sim) is a relation on states  $T \prec S$  where a target state  $T \in SState_{[P]}$  is related to a source state  $S \in SState_P$ . The directive transformation is a family of relations  $\triangleleft_{T \prec S} \subseteq Direct^* \times Direct^*$ . We characterize dt-sim in Rules 4.1. Consider any states  $T \in SState_{P}$  and  $S \in SState_{P}$  with  $T \prec S$ . To express that T is simulated by S means the following: Either both states are final (FINAL), or we have to explore all sequences of executable directives  $T \xrightarrow{d:l} V$  in [P] up to some bound (TGT). For each explored sequence of directives we apply the directive transformation  $\triangleleft_{T \triangleleft S}$ . Then, we need to replay the execution with a sequence of executable directives  $S \xrightarrow{e:k} U$  in P (SRC), so that  $V \prec U$ . Formally, we write  $\langle \prec, \triangleleft \vdash T \prec_t S : d \rangle$  to express that we are exploring executions in [P], have already seen a sequence of directives d and arrived at target state T. We can now either bound the exploration with DIRECT-TF, or continue exploration via TGT. With DIRECT-TF we look up a directive transformation for the explored d and swap to  $\langle \prec, \triangleleft \vdash V \prec_s S : e \rangle$ . This states that we are seeking to replay the explored sequence with its transformation e from S. If e is executable  $S = e^{i \cdot k}$ , U in P, COIND checks that the states reached from exploration in [P] and replay in P are again related,  $V \prec U$ . The notions of  $\lceil \prec \rceil$  and  $\lceil \prec \rceil$  make sure that both exploration in [P] and replaying in P take at least one execution step. The guarded version | requires at least one application of TGT or SRC to become the unguarded version | | Only then, DIRECT-TF and COIND become applicable. We write  $[\prec]$  for any of  $[\prec]$  or  $[\prec]$ .

DEFINITION 4 (dt-sim). A simulation with directive transformation  $(\prec, \lhd)$  consists of a relation  $\prec \subseteq SState_{[P]} \times SState_P$  and a family  $\lhd = (\lhd_{T \prec S})_{(T,S) \in \prec}$  so that for all initial  $t \in SState_{[P]}$ , there is an initial  $s \in SState_P$  with  $t \prec s$ , and for all  $T \prec S$ ,  $\langle \overrightarrow{| \prec} | \lhd_{T \prec S} \vdash T \prec_t S : \varepsilon \rangle$  can be proven in Rules 4.1.<sup>6</sup>

Example 4. Consider once again the transformation in Code 3 and the initial states t and s from Example 3. Further, let  $u=(4,\rho',\mu)$  and  $v=(d,\rho',\mu)$  with  $\rho'=\rho[a\mapsto 0]$ . We want to prove that  $t\prec s$  is justified, i.e. we need to construct  $\triangleleft_{t\prec s}$  so that  $\langle {} \checkmark, \triangleleft_{t\prec s} \vdash t \prec_t s : \epsilon \rangle$  is derivable. We drop the subscript and just write  $\triangleleft$ . Exploration via TGT yields (among others) two sequences of directives executable from t in  $[P]: sp. \bullet. \bullet$  as in Example 3 and Br. The corresponding execution takes us to  $t \xrightarrow{sp. \bullet. \bullet: BR f. \bullet. \bullet, *} t.v$  and  $t \xrightarrow{BR:BR}, v$ , respectively. After exploration with TGT, we are thus left to prove  $\langle {} \checkmark, {} \lor t.v \prec_t s : sp. \bullet. \bullet \rangle$  and  $\langle {} \checkmark, {} \lor t.v \prec_t s : sr. \bullet. \bullet \rangle$ . For the first sequence, we transform the directives as in Example 3: sp.lu sec  $0.\bullet$   $\triangleleft$  sp.  $\bullet$ . For the other case, we do not need a transformation, so Br  $\triangleleft$  Br. With direct-tr, we are left with deriving  $\langle {} \checkmark, {} \lor t.v \prec_s s : sp.lu sec 0.\bullet \rangle$  and  $\langle {} \checkmark, {} \lor t.v \prec_s s : sp.lu sec 0.\bullet \rangle$  and  $\langle {} \checkmark, {} \lor t.v \prec_s s : sp.lu sec 0.\bullet \rangle$  and  $\langle {} \checkmark, {} \lor t.v \prec_s s : sp.lu sec 0.\bullet \rangle$  and  $\langle {} \lor, {} \lor t.v \prec_s s : sp.lu sec 0.\bullet \rangle$  and  $\langle {} \lor, {} \lor,$ 

 $<sup>^6</sup>$ Our way to define simulations is inspired by recent work to unify stuttering [Cho et al. 2023].

$$\langle T \prec_{\mathsf{t}} S \rangle \xleftarrow{d} \langle V \prec_{\mathsf{t}} S \rangle \xleftarrow{e \mathrel{\triangleleft} d} \langle V \prec_{\mathsf{s}} S \rangle \xleftarrow{e} \langle V \prec_{\mathsf{s}} U \rangle$$

to state that the proof tree contains the nodes  $\langle \begin{bmatrix} -1 \\ -1 \end{bmatrix} \triangleleft_{S \prec T} \vdash V \prec_t S : d \rangle$ ,  $\langle \begin{bmatrix} -1 \\ -1 \end{bmatrix} \triangleleft_{S \prec T} \vdash V \prec_s S : e \rangle$ , and  $\langle \begin{bmatrix} -1 \\ -1 \end{bmatrix} \triangleleft_{S \prec T} \vdash V \prec_s U : \varepsilon \rangle$  on one path. In particular, this means that  $T \xrightarrow{d:l} V$  and  $S \xrightarrow{e:k} U$  for appropriate l, k, and  $e \triangleleft_{T \prec S} d$ . We define a synchronized transition relation that executes both in a single step.

DEFINITION 5. Given a simulation ( $\prec$ ,  $\triangleleft$ ), its simulation interval transition is defined by Rule SYNC.

$$\hookrightarrow$$
  $\hookrightarrow$   $\prec \times Direct^* \times Leak^* \times Direct^* \times Leak^* \times \prec$ 

SYNC further defines the transition relations  $\vdots_s$  and  $\vdots_t$  as the projection of simulation intervals to source and target program. Transitive closures of the transition relations are defined as usual. We say that a simulation is lock-step if simulation intervals are single step:  $\vdots_t, \vdots_s \subseteq \vdots$ .

*Example 5.* The simulation intervals resulting from Example 4 for  $t \prec s$  are:

$$(s,t) \xrightarrow{\text{SP.LU secret } 0. \bullet : \text{BR f.LD } n. \bullet \triangleleft \text{ Sp.} \bullet \bullet : \text{BR f.} \bullet \bullet} (s.u,t.v) \qquad (s,t) \xrightarrow{\text{BR : BR f.} \triangleleft \text{ BR : BR f.}} (u,v)$$

The following lemma states that our formulation of simulations is sound. That is, we find all of [P]'s behavior in the projection of the simulation interval transition relation  $\vdots$ <sub>t</sub>. This lets us perform (co-)induction on SBeh [P] S with  $\vdots$ <sub>t</sub> rather than  $\vdots$ <sub>t</sub>, which we will utilize in our proof of Theorem 2. The same is not true for the source program's behavior and  $\vdots$ <sub>s</sub>.

$$\text{Lemma 3. } \textit{If $T$ occurs in} \prec, \text{SBeh } [P] \ T = \{(d:l) \mid T \xrightarrow{d:l} {}^*_{t} X, X \textit{ final}\} \ \cup \ \{(d:l) \mid T \xrightarrow{d:l} {}^*_{t} \infty\}.$$

## 4.2 Snippy Simulations

So far, simulations are very liberal: Simulation merely require that a sequence of directives in [P] can be transformed via  $\triangleleft$  into a sequence of directives in P. The length and contained directives can change when applying  $\triangleleft$  and there are no restrictions on how the leakage changes when applying  $\triangleleft$ . In this section, we establish snippy simulations, a constant-time cube constraint [Barthe, Grégoire, and Laporte 2018] on simulations for speculative semantics that entails SNiP when satisfied.

The intuition for snippy simulations can be explained as follows. In order to prove that a transformation [.] satisfies SNiP, we are given a source program P and the target program [P], as well as four initial states: Two target initial states  $t_1 \equiv t_2$  and two simulating source states  $t_1 \prec s_1$  and  $t_2 \prec s_2$ , so that SBeh  $P s_1 = \text{SBeh } P s_2$ . The goal is to prove that the equality of behavior carries over to the target program. Given a simulation  $(\prec, \lhd)$  we so far know how to replay any sequence of directives



Fig. 2. Whenever the black conditions are met, a snippy simulation ≺ also explores the purple execution and simulates it by the source execution. Gray conditions follow from the semantics.

so far know how to replay any sequence of directives d from  $t_1$  transformed on  $s_1$  (Lemma 3). Consider a simulation interval  $(s_1, t_1)$   $\xrightarrow{e:k \land d:l} (U_1, V_1)$ . Due to same behavior of  $s_1$  and  $s_2$ , the source directives can also be executed from  $s_2$ ,  $s_2$   $\xrightarrow{e:k} U_2$ . Snippy simulations now state that, in such a situation, the simulation interval for  $s_2$  and  $t_2$  also contains  $(s_2, t_2)$   $\xrightarrow{e:k \land d:l} (U_2, V_2)$ . That means,  $t_2$  can also execute d and produce the same leakage. And further,  $(\prec, \lhd)$  does also explore d from  $t_2$ , not a longer or shorter sequence. Figure 2 demonstrates the constraint in the general case, where states need not be initial. The simulation interval of  $S_1$  and  $S_2$  and the ability for another source state  $S_2$  to mimic the behavior are in black. The constraint is in purple:  $(\prec, \lhd)$  has to also provide the same simulation interval for any other state  $S_2$  at the same program point as  $S_2$ . With snippy simulations defined, we conclude the section with the proof of Theorem 2.

DEFINITION 6. A snippy simulation  $(\prec, \lhd)$  is sec-respecting and satisfies the diagram in Figure 2. That is, for all  $S_1 \equiv S_2$  and  $T_1 \equiv T_2$  with  $T_1 \prec S_1$ ,  $T_2 \prec S_2$ , and  $(S_1, T_1) \xrightarrow{e:k \lhd d:l} (U_1, V_1)$ ,

$$S_2 \xrightarrow{e:k*} U_2$$
 implies the existence of  $V_2$  with  $(S_2, T_2) \xrightarrow{e:k \triangleleft d:l} (U_2, V_2)$ .

PROOF OF THEOREM 2. Consider a program P and a snippy simulation  $(\prec, \lhd)$ . We need to prove the following: For all initial  $t_1 =_{\text{sec}} t_2$  with  $t_1 \prec s_1$  and  $t_2 \prec s_2$ : When SBeh [P]  $S_1 = \text{SBeh}$  [P]  $S_2$ , then also SBeh [P]  $S_1 = \text{SBeh}$  [P]  $S_2 = \text{SBeh}$  [P] [P

This is sufficient: Consider initial target states  $t_1 =_{sec} t_2$  as well as source states  $t_1 \prec s_1$  and  $t_2 \prec s_2$  with SBeh  $P s_1 = SBeh P s_2$ . Initial states are all at the same program point, so the requirements of the claim are satisfied and SBeh  $[P] t_1 \subseteq SBeh [P] t_2$  holds. By symmetry, SBeh  $[P] t_1 = SBeh [P] t_2$ .

We prove our claim coinductively on SBeh [P]  $T_1$  split into simulation intervals (Lemma 3). The case of a final  $T_1$ , i.e. SBeh [P]  $T_1 = \{(\varepsilon : \varepsilon)\}$ ,  $T_2 \equiv T_1$  is final, too, and thus  $(\varepsilon : \varepsilon) \in SBeh$  [P]  $T_2$ . In the (co-)inductive case, let  $(e.d:k.l) \in SBeh$  [P]  $T_1$  with  $T_1 \xrightarrow{e:k} V_1$  from a simulation interval  $(S_1, T_1) \xrightarrow{f:m \cdot e:k} (U_1, V_1)$ . From SBeh P  $S_1 = SBeh$  P  $S_2$  follows  $S_2 \xrightarrow{f:m \cdot *} U_2$ . Lemma 1 gives  $U_1 \equiv U_2$  and same behavior of  $S_1$  and  $S_2$  entails SBeh P  $U_1 = SBeh$  P  $U_2$ . Snippyness then yields the simulation interval  $(S_2, T_2) \xrightarrow{f:m \cdot e:k} (U_2, V_2)$ , i.e.  $T_2 \xrightarrow{e:k \cdot *} V_2$ . Lemma 1 gives  $V_1 \equiv V_2$ . We can now apply (co-)induction hypothesis for SBeh [P]  $V_1 \subseteq SBeh$  [P]  $V_2$ , which implies  $(e.d:k.l) \in SBeh$  [P]  $V_2$ . Together with  $T_2 \xrightarrow{e:k \cdot *} V_2$ , we arrive at  $(e.d:k.l) \in SBeh$  [P]  $T_2$ .

## 5 Case Study: Dead Code Elimination

In this section, we prove that dead code elimination [.]<sup>dc</sup> satisfies SNiP as a demonstration of the proof method and a warm-up for the next section, where we tackle register allocation transformations. We first give a short rundown on the transformation of [.]<sup>dc</sup> before crafting a snippy simulation that is parametric in the source program P. Dead code elimination is responsible for removing instructions from P whose computed values are not utilized anywhere. It is the result of a data flow analysis that finds removable instructions. Typically, the analysis follows constant propagation in order to identify as many aliasing memory accesses as possible. To support that, we assume that load and store instructions  $a := \mathbf{a}[n]$  and  $\mathbf{a}[n] := c$  can hold a constant address  $n \in Adr$  instead of a register, where we require  $n \in |\mathbf{a}|$ . For the remaining section, fix an arbitrary program P with entry point init.

Flow analysis. The static analysis for dead code elimination is a Liveness analysis. Since we will later define our own flow analysis when fixing the weaknesses in register allocation, we recall flow analysis in a more general setting. A flow analysis searches for a fixed-point solution to a system of flow inequalities in order to obtain approximate knowledge about all executions of a program. Formally, a forward/backward flow analysis finds a solution  $\mathcal X$  to the inequalities (fwd)/(bwd), where pc and sc range over program counters so that pc is a predecessor to sc.

$$\begin{array}{ll} \mathcal{X}\,\mathrm{sc} \geq F_{\mathrm{pc}}\left(\mathcal{X}\,\mathrm{pc}\right) & \mathcal{X}\,\mathrm{pc} \geq F_{\mathrm{sc}}\left(\mathcal{X}\,\mathrm{sc}\right) \\ \mathcal{X}\,\mathrm{init} \geq \mathcal{X}_{0} & \mathcal{Y}\,\mathrm{pc} \geq \mathcal{X}_{0} & \mathit{P}\,\mathrm{pc} = \mathsf{ret} \end{array} \tag{bwd}$$

Flow values stem from a semi-lattice  $(L, \leq)$  and a solution  $X: PC \to L$  finds a flow value for each program point. In case of a forward analysis, X pc denotes the flow value at pc before the execution of P pc. For a backward analysis X pc denotes the flow value at pc after the execution of P pc. The flow value  $X_0$  is the initial flow value of entry / exit points of the program. The functions  $F_{pc}: L \to L$  are monotonic and constitute the *transfer* of the flow values along instructions.

A flow analysis can have *additional constraints*. Additional constraints are of shape X pc  $\leq l$ . They additionally require flow values of certain program counters pc not to exceed a bound  $l \in L$ . If the least solution to the flow analysis does not satisfy the additional constraints, no solution does.

Dead Code Elimination. Liveness analysis is a backward flow analysis. The flow values are the sets of registers and memory locations that are live at any given program point in that their current value could be used later. The flow lattice is  $L = \mathcal{P}(Reg \cup Mem)$ , and the initial flow value at any exit point is  $X_0 = Reg \cup Mem$ , but can be different dependent on calling conventions. The transfer functions are folklore, so we instead formulate the guarantee that comes with a solution. Note that the guarantee holds for speculative semantics, too, because the analysis is branch-independent.

PROPOSITION 6. Whenever  $s \xrightarrow{d:l}^* U.u$  in P, if Pu uses a register b, then  $b \in F_u(Xu)$ , and if Pu loads a memory location a with offset n, then  $(a, n) \in F_u(Xu)$ .

The transformation [.] dc uses a Liveness analysis solution X of the backward flow inequalities (bwd) to remove unnecessary instructions. A function  $dc: L \to Inst \to Inst$  inspects the flow value at a given program point and removes an instruction if it writes a register or memory location that is not live. The transformation of P is then defined per program point with  $[P]^{dc}$  pc = dc (X pc) (P pc).

not live. The transformation of 
$$P$$
 is then defined per program point with  $[P]^{ac}$  pc  $= dc (X \text{ pc}) (P \text{ pc})$ 

$$dc \ l \ (a \coloneqq b \oplus c_{>sc}) = \begin{cases} \mathsf{nop}_{>sc} & a \notin l \\ a \coloneqq b \oplus c_{>sc} & a \in l \end{cases}$$

$$dc \ l \ (a \coloneqq \mathbf{a}[x]_{>sc}) = \begin{cases} \mathsf{nop}_{>sc} & a \notin l \\ a \coloneqq \mathbf{a}[x]_{>sc} & a \in l \end{cases}$$

$$dc \ l \ (a \coloneqq \mathbf{a}[x] \coloneqq c_{>sc}) = \begin{cases} \mathsf{nop}_{>sc} & x = \mathsf{n}, (\mathbf{a}, \mathsf{n}) \notin l \\ \mathbf{a}[x] \coloneqq c_{>sc} & \text{otherwise} \end{cases}$$

$$dc \ l \ i = i \quad \text{for other } i$$

$$(\mathrm{pc},\rho,\mu) \prec (\mathrm{pc},\rho',\mu') \qquad \iff \qquad \begin{aligned} \forall a \in F_{\mathrm{pc}}\left(X\,\mathrm{pc}\right).\,\rho\,\,a = \rho'\,\,a \\ \land \quad \forall (\mathbf{a},\mathsf{n}) \in F_{\mathrm{pc}}\left(X\,\mathrm{pc}\right).\,\mu\,\,\mathbf{a}\,\,\mathsf{n} = \mu'\,\,\mathbf{a}\,\,\mathsf{n} \end{aligned}$$

In order to define the directive transformation  $\triangleleft_{t \prec s}$ , we need to think about the shape of simulation intervals. Because  $[.]^{dc}$  leaves the control flow fully intact, we can choose to create a lockstep simulation. With that, we can choose  $\triangleleft_{t \prec s}$  to be the identity relation on *Direct* and add transformations where  $[.]^{dc}$  replaced an instruction with nop. Let i = Ps and  $i' = [P]^{dc}t$ . We set:

For speculating states, we simply lift  $\prec$  by setting  $T.t \prec S.s$  if  $T \prec S$  and  $t \prec s$  (and  $\varepsilon \prec \varepsilon$  for the base case). For the directive transformation, we delegate to the executing states  $\triangleleft_{T.t \prec S.s} = \triangleleft_{t \prec s}$ .

THEOREM 7.  $(\prec, \lhd)$  is a snippy lockstep simulation.

PROOF. We need to prove that  $(\prec, \lhd)$  (i) is a simulation (Definition 4), (ii) is snippy (Definition 6), and (iii) respects sec (Definition 2). The first part (i) is considerably easier than in the general case, because the simulation is lockstep, i.e. both P and  $[P]^{dc}$  only perform a single step before finding new states in  $\prec$ . Consider  $T_1.t_1 \prec S_1.s_1$ ,  $T_2.t_2 \prec S_2.s_2$ ,  $T_1.t_1 \equiv T_2.t_2$   $S_1.s_1 \equiv S_2.s_2$  and  $T_1.t_1 \xrightarrow{\delta:\lambda} V_1$ , where  $t_1 \equiv t_2 \equiv s_1 \equiv s_2 \equiv \text{pc}$ . We need to show that there is  $\gamma \triangleleft_{t_1 \prec s_1} \delta$  so that  $S_1.s_1 \xrightarrow{f:\lambda} U_1$  and  $V_1 \prec U_1$ . Second we prove snippyness (ii): We additionally consider the existence of  $S_2.s_2 \xrightarrow{f:\lambda} V_2$ . We then need to show that  $T_2.t_2 \xrightarrow{\delta:\lambda} V_2$  and  $\gamma \triangleleft_{t_2 \prec s_2} \delta$  (which implies  $(s_2, t_2) \xrightarrow{\delta:\lambda \dashv \gamma:\kappa} (u_2, v_2)$ ).

We first split off the case  $\delta = \gamma = \text{RB}$ , where  $|T_1| > 0$ . For (i), we need to prove  $S_1.s_1 \xrightarrow{\text{RB}:\text{RB}} S_1$  because  $\text{RB} \triangleleft_{t_1 \triangleleft s_1}$  RB. But  $T_1 \triangleleft S_1$  implies  $|T_1| = |S_1| > 0$  which meets the premise for the transition. For (ii), additionally consider  $S_2.s_2 \xrightarrow{\text{RB}:\text{RB}} S_2$ . We need to show  $T_2.t_2 \xrightarrow{\text{RB}:\text{RB}} T_2$ . With the same argument as before,  $|T_1| = |S_1| = |S_2| = |T_2| > 0$  satisfies the premise for the transition. Proving (i) and (ii) in the case of  $\delta \neq \text{RB} \neq \gamma$  is a large case distinction on  $\langle P \text{ pc}, [P]^{\text{dc}} \text{ pc} \rangle$ .

▶  $\langle a := \mathbf{a}[b]_{ssc}, \mathsf{nop}_{ssc} \rangle$  There are two subcases:  $\rho_{s_1} b = \mathsf{n}$  is within  $|\mathbf{a}|$  or not. We present the subcase  $\mathsf{n} \notin |\mathbf{a}|$ . For (i), consider  $t_1 \xrightarrow{\bullet : \bullet} v_1$ . We need to show that  $\mathsf{LU} \, \mathbf{b} \, \mathsf{m}$  is executable in  $s_1$ , because  $\mathsf{LU} \, \mathbf{b} \, \mathsf{m} \, \triangleleft_{t_1 \prec s_1} \bullet$ . Indeed, we have  $s_1 \xrightarrow{\mathsf{LU} \, \mathsf{b} \, \mathsf{m} : \mathsf{LD} \, \mathsf{n}} u_1$  because  $\rho_{s_1} b = \mathsf{n} \notin |\mathbf{a}|$ . We also need to show that  $v_1 \prec u_1$ . From the definition of dc, we know  $a \notin X$  pc because the instruction was replaced by nop. The transfer  $F_{\mathsf{pc}} (X \, \mathsf{pc})$  is again  $X \, \mathsf{pc}$  because a load to a dead register makes no registers or memory locations live. Together with (bwd), we get  $F_{\mathsf{pc}} (X \, \mathsf{pc}) = X \, \mathsf{pc} \supseteq F_{\mathsf{sc}} (X \, \mathsf{sc})$ . Registers and memory of  $t_1$  and  $t_1$  are equal (nop) and  $t_1$  and  $t_2$  and  $t_3$  implies  $t_1 \prec t_2$  because  $t_1 \prec t_3$  implies  $t_2 \prec t_3$  because  $t_3 \not \in X \, \mathsf{pc}$  because  $t_3 \not \in X \, \mathsf{pc}$  because  $t_4 \not \in X \, \mathsf{pc}$  because  $t_5 \not \in X \, \mathsf{pc}$  because  $t_7 \not$ 

For (iii), we need to restrict our attention to Liveness analysis where all registers and memory locations are initially live. Every solution  $\mathcal{X}$  is easily modified to satisfy this restriction. The proof that  $\prec$  respects sec is then straightforward: For initial states t and s,  $t \prec s$  if and only if t = s.  $\Box$ 

 $<sup>^7\</sup>mathrm{A}$  weaker formalization of sec-respecting relations would lift this restriction, but incur more presentational overhead.

## 6 Fixing Weaknesses in Register Allocation

Register allocation happens in the compilation phase that moves from an IR to the hardware instructions of the target architecture. It moves from the unbounded number of virtual registers occurring in the IR version of the program to the finite set of hardware registers. In order to do so, register allocation performs a relocation of register contents. For each program point, a subset of the virtual registers is selected to be kept as hardware registers and the remaining virtual registers are spilled to the stack. The literature describes various approaches towards selecting the set of registers to be spilled [Chaitin et al. 1981; Traub et al. 1998; Poletto and Sarkar 1999; Tichadou and Rastello 2022], leading to different transformations depending on the chosen algorithm. In this paper, we express the transformations from register allocation more generally as a set of constraints. A transformation constitutes a viable register allocation if it satisfies the constraints. Practical allocation algorithms produce viable allocations making our results apply to all of them.

We first present the constraints a register allocation transformation needs to satisfy. We then demonstrate how the transformation violates speculative non-interference preservation. We continue to develop a static analysis that finds potential violations. Finally, we fix the violations and craft a snippy simulation for the fixed transformation and prove that it satisfies SNiP.

## 6.1 Register Allocation

Register allocation transforms a source program *P* by inserting *shuffle instructions*.

$$si \in Shuffle\ Inst ::= a := b_{>sc} \mid a := stk[I]_{>sc} \mid stk[I] := b_{>sc} \mid slh\ a_{>sc} \mid sfence_{>sc}$$

The instructions extend Inst and are inserted in between the existing instructions in order to maintain the register relocation with the instructions move a := b, fill a := stk[I], and spill stk[I] := b. We also include slh a and sfence because we need them later to fix the transformation. The semantics are as expected: A move a := b relocates contents from b to a, a := stk[I] reloads a spilled register from a constant address  $l \in Adr$  in the stack, and stk[I] := b spills a register to the stack. We model the stack frame's section used for spilled registers with a fresh memory variable stk not occurring in p. We assume that stk has appropriate size to fit all spilled registers and is typed scc stk = L. Shuffle code is always straight line, so for a shuffle sequence  $sh \in Shuffle Inst^*$  we introduce the notion p pc  $sh_{ssc}$  to express that p executes sh and then ends in sc.

Constraints for a valid transformation. Register allocation inserts shuffle code between existing instructions to realize the register relocation. Formally, a target program  $[P]^{\mathrm{ra}}: PC_{[P]^{\mathrm{ra}}} \to Inst$  is a register allocation if there exist functions  $\Phi$  and  $\Psi$ . The first is an injection  $\Phi: PC_P \to PC_{[P]^{\mathrm{ra}}}$  of the original instructions of P to their counterparts in  $[P]^{\mathrm{ra}}$ . The second function is a relocation mapping  $\Psi: PC_{[P]^{\mathrm{ra}}} \to Reg \to (Reg \cup Stk)$ , where  $Stk = \{(\mathbf{stk}, \mathsf{I}) \mid \mathsf{I} \in |\mathbf{stk}|\} \subseteq Mem$  is the stack frame for spilled variables. At each program counter  $\mathsf{pc}' \in PC_{[P]^{\mathrm{ra}}}$ ,  $\Psi$   $\mathsf{pc}'$  a is the relocation of the virtual register a from P to the hardware register or stack location in  $[P]^{\mathrm{ra}}$ . The functions are subject to the following conditions. Instruction matching: The  $\Phi$ -injected instructions must operate on the same registers up to relocation by  $\Psi$ . Shuffle conformity: The relocation  $\Psi$  must conform to the (shuffle) instructions in  $[P]^{\mathrm{ra}}$ . Obeying Liveness: Every live register in P is mapped under  $\Psi$  and no location is doubly allocated. To state the conditions formally, we introduce notation for the defined and used registers of an instruction. A register b is used by b if it is written in that rule. For example,  $a := \mathbf{a}[b]$  uses  $\{b\}$  and defines  $\{a\}$ .

$$uses_i = \{b \mid b \in Reg \text{ is used by } i\}$$
  $def_i = \{a \mid b \in Reg \text{ is defined by } i\}$ 

<sup>&</sup>lt;sup>8</sup>In general, the content of a register *a* could be relocated to multiple locations. For simplicity of presentation, we forbid that.

Code 4. An Example register allocation. Lines b and e are the inserted shuffle code.

Instruction Matching requires that source instructions from P reappear in  $[P]^{ra}$ : For every  $pc \in PC_P$  and  $pc' = \Phi$  pc, i = P pc has to match with  $i' = [P]^{ra}$  pc'. To match, the instruction i' must be the same, but registers  $b \in uses_i$  are replaced with  $\Psi$  pc'  $b \in Reg$ . Similarly, registers  $a \in def_i$  are replaced with  $\Psi$  sc'  $a \in Reg$ , where sc' is the successor of pc'. Defined registers are found in the successor's relocation, because they are live only after executing i'. All other instructions at program points  $pc' \notin img(\Phi)$  must be shuffle instructions. Given  $s \in State_P$  and  $t \in State_{[P]^{ra}}$ , we write  $s \Phi t$  when  $s \equiv pc$  and  $\Phi pc \equiv t$ . We extend the notation to  $s \Phi T$  in the expected way.

Shuffle Conformity requires that the relocation  $\Psi$  is upheld by the instructions in  $[P]^{\text{ra}}$ . Consider an instruction  $i' = [P]^{\text{ra}}$  pc' at pc'  $\in PC_{[P]^{\text{ra}}}$  and let sc' be a successor. First, registers and stack locations untouched by i' must stay at the same location in  $\Psi$ : For any register d with  $\Psi$  pc'  $d \notin \text{uses}_{i'}$  and  $\Psi$  sc'  $d \notin \text{def}_{i'}$ ,  $\Psi$  pc'  $d = \Psi$  sc' d. Second, if i' is a shuffle instruction there are additional requirements: Shuffle instructions move one source register's content from one location in  $[P]^{\text{ra}}$  to another. As a consequence, if i' is a shuffle instruction, there must be a source register a being moved and the location moved to must be free. We write free $_{r'}$  d' and free $_{r'}$  (stk, l) for d', (stk, l)  $\notin$  img( $\Psi$  r'). Dependent on i', we add the following constraints, (a is the source register):

Obeying Liveness means that all live variables in P must be allocated. There must be a Liveness solution  $\mathcal X$  (Proposition 6) for P so that for all locations  $\operatorname{pc} \in PC_P$ , all registers  $a \in \mathcal X$   $\operatorname{pc}$  live at  $\operatorname{pc}$  are allocated, i.e.  $\Psi$  ( $\Phi$   $\operatorname{pc}$ )  $a \neq \bot$ . Further, a location cannot be allocated twice, i.e. for all  $\operatorname{pc} \in PC_{[P]^{\operatorname{ra}}}$ ,  $\Psi$   $\operatorname{pc}$  forms an injection on the live registers at  $\operatorname{pc}$ .

Definition 7. A transformation from P to  $[P]^{ra}$  is a register allocation if there are instruction matching, shuffle conform, and Liveness obeying  $(\Phi, \Psi)$ .

Example 8. Code 4 contains an example register allocation. It is a simplified version of Code 1, which still exhibits the weakness of register allocation. The left program starts with a secret value in register secret and public values in b and bytes. It stores the secret into buf when the address b is in bounds of |buf|. Then, it leaks bytes at 4. The right program is after register allocation. Register allocation has inserted the spill and fill instructions b and e. The instruction injection  $\Phi$  can be seen from side by side alignment. It is the mapping  $\{1 \mapsto b, 2 \mapsto c, 3 \mapsto d, 4 \mapsto f, 5 \mapsto g\}$ . The register mapping  $\Psi$  at a and b makes no relocations:  $\Psi$  a =  $\Psi$  b =  $id_{Reg}$ . The spill instruction at b relocates bytes to the stack. It remains spilled from c to e:  $\Psi$  c bytes = ... =  $\Psi$  e bytes = (stk, 0). Further, because a is not live after 2 it is not allocated from c to g:  $\Psi$  c a = ... =  $\Psi$  g a =  $\bot$ . Instead, the register allocation reuses a in f: The fill instruction at e relocates bytes to a:  $\Psi$  f bytes = a.

## 6.2 Register Allocation is not SNiP

Transformations that satisfy the constraints for a valid register allocation are known to be noninterference preserving, provided that the semantics are speculation-free [Barthe, Blazy, Grégoire, et al. 2019; Barthe, Grégoire, Laporte, and Priya 2021]. We first assumed that the same is true for speculative semantics as well, because register allocation only inserts shuffle instructions. While shuffle instructions produce leakages, they are constant address loads and stores which means they only leak constant values. To our surprise, when we tried to form a snippy simulation relation for register allocation, we realized that it was not SNiP. Code 4 demonstrates a minimalistic example for how register allocation introduces weaknesses into  $[P]^{ra}$ . The left program is SNi, because the leakage at Line 4 depends on the register bytes that holds a public value untouched since the start of the execution. The right program, however, is susceptible to the same attack as described in Section 1 for Code 1: Speculatively executing c when b holds a value out of |buf| can store the secret value from secret to stk[0]. That value is then loaded at e into a and leaked at f. The fault for this attack is not with the constant leakage of shuffle instructions. The weakness occurs from spilling itself, i.e. from the relocation of registers to memory: Conceptually, speculative execution can access memory everywhere because it performs unsafe memory accesses. But it cannot read or write register contents. Register allocation moves registers into memory, effectively granting unsafe memory operations access to spilled registers.

We were able to reproduce the weakness with a real compiler. Code 4 is inspired by Code 1 which is an excerpt of the libsodium function chacha20\_encrypt\_bytes with slight modifications. The function is responsible for encryption of data using the Chacha20 stream cipher and, similar to Code 1, first copies the data into a stack-local buffer before executing the encryption algorithm on it. Our tests were done on LLVM 17, and we tested each of LLVM's register allocators (i.e. basic, greedy, fast, and pbqp). They *all* insert the instructions b and e (Code 4) when compiling our program without other optimizations. This means the vulnerability is inherent to register allocation, and cannot avoided by just opting towards a particular register allocator.

# 6.3 Poison-Tracking Product

We develop a static analysis that reveals weaknesses like the one in Section 6.2. The analysis operates on a product construction between source program and register-allocated program. To motivate the construction, we inspect the simulation for register allocation under speculation-free semantics, and explain what fails under speculative semantics. Under speculation-free semantics, a simulation for register allocation matches a state s of the source program P to a state t of the target program  $[P]^{ra}$  whenever the instruction is matched,  $s \Phi t$ , and the values of all registers and memory locations in s and t are equal up to relocation  $\Psi$ . When P is memory safe, one can then prove that any instruction executed from t can be replayed from s. The simulation preserves non-interference: Apart from the leakages inserted by shuffle instructions (which are constant addresses), the leakage that arises from execution in t is equal to the leakage from s. The previous section showed that it is not enough to just extend this approach to speculating states: Speculation introduces unsafe memory operations that access the  $\Psi$ -mapped stack locations in  $[P]^{ra}$ . Such an access cannot be simulated by the source program P, as it has the accessed value in a register where the memory operation cannot access it. As a result, if we executed P and  $[P]^{ra}$  in parallel we will eventually see speculating states S and T with different values in registers of S and their  $\Psi$ -mapped location in *T*. Further execution propagates the differences to other locations.

The goal in fixing register allocation is to make sure that differences in value do not lead to leakage of sensitive data. An immediate mitigation would be to insert an sfence instruction before every

<sup>&</sup>lt;sup>9</sup>Targeting x86-64. We used the opt passes mem2reg, simplify-cfg, module-inline before register allocation with llc.

load and store operation. This would eliminate speculating unsafe memory accesses altogether. However, this introduces many sfence instructions and reduces performance more than necessary. Instead, we construct a product of P and  $[P]^{\rm ra}$  that tracks differences in values between S and T. We call registers and memory locations that hold different values *poisoned*, because leaking them in  $[P]^{\rm ra}$  might be unsafe: If S and T execute an instruction that leaks the value of a poisoned register, we cannot rely on P being SNi to justify that the leakage from T is safe. Leakage of healthy (not poisoned) registers, however, is safe for that reason. As a result, we only need to protect instructions that leak a poisoned register's value.

*Poison types.* Poison types represent the registers and memory locations where P and  $[P]^{ra}$  can have different values. Consider states  $s \in State_P$  and  $t \in State_{\lceil P \rceil^{ra}}$ . A poison type ps  $\in Ptype$  is a function  $(Reg \cup Mem \setminus Stk) \rightarrow Poison$ . It assigns a poison value to each register and memory location of s. The poison values *Poison* = {p, wp, h} have the following meaning: Healthy registers and memory locations (h) are equal between s and t up to relocation by  $\Psi$ . Poisoned registers and memory locations (p) can differ between s and t up to relocation by  $\Psi$ . Finally, registers and memory locations can be weakly poisoned (wp). This poison value is introduced because of slh a. If slh a occurs as a shuffle instruction, executing it speculatively sets a to 0. But because it is a shuffle instruction it does not occur in P. That makes a's value different between P and  $[P]^{ra}$ . However, every execution in  $[P]^{ra}$  that executes this shuffle instruction speculatively also sets a to 0. This makes it safe to leak in  $[P]^{ra}$ , even though the value differs between P and  $[P]^{ra}$ . We thus type registers and memory locations that hold a 0 due to an slh a instruction weakly poisoned (wp). Formally, we express that states  $s \in State_P$  and  $t \in State_{P_1^{ra}}$  are equal up to a poison-type ps and relocation  $\Psi$  with the notation  $\Psi^{ps}$  t. We then extend the notation to speculating states  $S \in SState_P$  and  $T \in SState_{[P]^{ra}}$  of equal speculation depth |S| = |T|. For that, we have sequences of poison types  $P \in Ptype^*$ , one poison type for each level of speculation |S| = |T| = |P|. To formalize the notation  $S \Psi^P T$ , write ps[pv] for the set of all registers and memory locations typed pv by ps,  $ps[pv] = \{a, (a, n) \mid ps \ a = pv, ps \ a \ n = pv\}$ . Further, we write  $[a]_t = \rho_t \ a$  and  $[(a, n)]_t = \mu_t \ a$  n. Let  $s = (pc, \rho, \mu) \in State_P$  and  $t = (pc', \rho', \mu') \in State_{[P]^{ra}}$  range over states of P and  $[P]^{ra}$ . We define:

$$\forall a \in \operatorname{ps}[h]. \ \llbracket \Psi \operatorname{pc}' a \rrbracket_t = \rho \ a$$
 
$$s \ \Psi^{\operatorname{ps}} \ t \qquad \land \qquad \forall a \in \operatorname{ps}[\operatorname{wp}]. \ \llbracket \Psi \operatorname{pc}' a \rrbracket_t = 0 \qquad \qquad S.s \ \Psi^{\operatorname{p.ps}} \ T.t \qquad S \ \Psi^{\operatorname{p}} \ T$$
 
$$if \ and \ only \ if \qquad \land \quad \forall (\mathbf{a}, \mathbf{n}) \in \operatorname{ps}[h]. \ \mu' \ \mathbf{a} \ \mathbf{n} = \mu \ \mathbf{a} \ \mathbf{n} \qquad \qquad if \ and \ only \ if \qquad \land s \ \Psi^{\operatorname{ps}} \ t$$
 
$$\land \ \forall (\mathbf{a}, \mathbf{n}) \in \operatorname{ps}[\operatorname{wp}]. \ \mu' \ \mathbf{a} \ \mathbf{n} = 0$$

The definition of  $s \Psi^{ps} t$  meets the intuition: Healthy registers and memory locations need to coincide between s and t and weakly poisoned locations must be 0 in t. For poisoned locations, there are no requirements. For speculating states, the definition is applied at each speculation level. We write h, wp, and p for the poison types that assign the respective poison value everywhere.

*Example 9.* We observe the poisoned values in the attack from Section 6.2 in Code 4 when running *P* and  $[P]^{ra}$  side by side. The register allocation  $(\Phi, \Psi)$  is described in Example 8. The initial states are  $s = (1, \rho, \mu)$  and  $t = (a, \rho, \mu)$ , with  $\rho$  b ∉ |buf| and  $\rho$  secret = v ≠  $\rho$  bytes. We also assume that already  $\rho$  a = f (so that the updates in 1 and a have no effect). After executing the store instruction at 3 with the directive su buf 0, P's state is  $u = (4, \rho, \mu[(buf, 0) \mapsto v])$ . Similarly, after executing the store instruction at d with the directive su stk 0,  $[P]^{ra}$ 's state is  $v = (e, \rho, \mu[(stk, 0) \mapsto v])$ . We see a poisoned value for bytes: At e, it is still located in  $\Psi$  e bytes = (stk, 0) due to the spill at b. But the values differ with  $\rho_v$  bytes ≠  $v = \mu'$  stk 0. Further,  $\mu_u$  buf 0 ≠  $\mu_v$  buf 0 holds a poisoned value. With a poison type that is fully healthy except on bytes and (buf, 0), ps = h[bytes,  $(buf, 0) \mapsto \rho$ ], we have u  $\Psi^{ps}$  v.

```
Rules 6.1: Stack
                                                                                                                                                                           POISON-STORE-STKUNSAFE
                          POISON-LOAD-STKUNSAFE
                                             P s = a := \mathbf{a}[b]_{\text{sc}}
ps b = h \quad ps' = ps[a \mapsto p]
s \stackrel{\text{LU} \mathbf{b}|:\text{LD} \mathbf{n}}{\longrightarrow} u \quad t \stackrel{\text{LU} \mathbf{stk} \mathbf{m}:\text{LD} \mathbf{n}}{\longrightarrow} v
                                                                Ps = a := \mathbf{a}[b]_{sc}
                                                                                                                                                                            P s = \mathbf{a}[b] := c_{\mathsf{SSC}} \quad \Psi v d = (\mathbf{stk}, \mathsf{m})
                                                                                                                                                                                 ps b = h \quad ps' = ps[d, (\mathbf{a}, l) \mapsto p]
                                                                                                                                                                                  s \xrightarrow{\text{su a l:st n}} u \quad t \xrightarrow{\text{su stk m:st n}} v
                              (s, t, ps) \xrightarrow{\text{LU} \, \mathbf{b} \, | : \text{LD} \, \mathbf{n} : : \text{LU} \, \mathbf{stk} \, \mathbf{m} : \text{LD} \, \mathbf{n}} (u, v, ps') (s, t, ps) \xrightarrow{\text{SU} \, \mathbf{a} \, | : \text{ST} \, \mathbf{n} : : \text{SU} \, \mathbf{stk} \, \mathbf{m} : \text{ST} \, \mathbf{n}} (u, v, ps')
-Rules 6.2 Loads -
                                                                                                                                                                                                                                                             Ps = a := \mathbf{a}[b]_{sc}
                   OISON-LOAD-SAFE
ps \ b = wp \quad s \xrightarrow{\bullet: \text{LD } k :: \bullet: \text{LD } 0} u \quad t \xrightarrow{\bullet: \text{LD } 0} v
(s, t, ps) \xrightarrow{\bullet: \text{LD } k :: \bullet: \text{LD } 0} (u, v, ps[a \mapsto p])
poison-load-unsafe
ps \ b = wp \quad s \xrightarrow{\text{LU } \mathbf{a} \mid: \text{LD } k} u \quad t \xrightarrow{\bullet: \text{LD } 0} v
(s, t, ps) \xrightarrow{\text{LU } \mathbf{a} \mid: \text{LD } k :: \bullet: \text{LD } 0} (u, v, ps[a \mapsto p])
               POISON-LOAD-SAFE
                                                                                                                                                                        HEALTHY-LOAD-UNSAFE
                                                                                                                                                                                       ps b = h \quad ps' = ps[a \mapsto ps b m]
            HEALTHY-LOAD-SAFE
                ps b = h \quad s \xrightarrow{\bullet: LD n} u \quad t \xrightarrow{\bullet: LD n} v \qquad s \xrightarrow{LUb m: LD n} u \quad t \xrightarrow{LUb m: LD n} v \quad b \neq stk
              (s, t, ps) \xrightarrow{\bullet : LD \, n :: \bullet : LD \, n} (u, v, ps[a \mapsto ps \, a \, n]) \qquad (s, t, ps) \xrightarrow{LU \, b \, m : LD \, n :: LU \, b \, m : LD \, n} (u, v, ps')
 -Rules 6.3 Stores-
                                                                                                                                                                                                                                                             Ps = \mathbf{a}[b] := c_{\mathsf{sc}}
       POISON-STORE-SAFE
          \begin{array}{c} \operatorname{ps'} = \operatorname{ps}[(\mathbf{a}, \mathbf{k}), (\mathbf{a}, 0) \mapsto \operatorname{p}] \\ \operatorname{ps} b = \operatorname{wp} \quad s \xrightarrow{\bullet:\operatorname{sT}\mathbf{k}} u \quad t \xrightarrow{\bullet:\operatorname{sT}0} v \\ \hline (s, t, \operatorname{ps}) \xrightarrow{\bullet:\operatorname{sT}\mathbf{k}::\bullet:\operatorname{sT}0} (u, v, \operatorname{ps'}) \end{array} \qquad \begin{array}{c} \operatorname{poison-store-unsafe} \\ \operatorname{ps} b = \operatorname{wp} \quad s \xrightarrow{\operatorname{sual:sT}\mathbf{k}} u \quad t \xrightarrow{\bullet:\operatorname{sT}0} v \\ \hline (s, t, \operatorname{ps}) \xrightarrow{\operatorname{sual:sT}\mathbf{k}::\bullet:\operatorname{sT}0} (u, v, \operatorname{ps}[(\mathbf{a}, \mathbf{l}), (\mathbf{a}, 0) \mapsto \operatorname{ps} c]) \end{array} 
        HEALTHY-STORE-SAFE
           ps' = ps[(\mathbf{a}, \mathbf{n}) \mapsto ps c]
ps b = h \quad s \xrightarrow{\bullet: \text{ST} \mathbf{n}} u \quad t \xrightarrow{\bullet: \text{ST} \mathbf{n}} v \quad ps b = h \quad s \xrightarrow{\bullet: \text{ST} \mathbf{n} :: \bullet : \text{ST} \mathbf{n}} (u, v, ps')
(s, t, ps) \xrightarrow{\bullet: \text{ST} \mathbf{n} :: \bullet : \text{ST} \mathbf{n}} (u, v, ps')
HEALTHY-STORE-UNSAFE
ps b = h \quad s \xrightarrow{\text{SU} \mathbf{b} \text{m} : \text{ST} \mathbf{n}} u \quad t \xrightarrow{\text{SU} \mathbf{b} \text{m} : \text{ST} \mathbf{n}} v \quad \mathbf{b} \neq \mathbf{stk}
(s, t, ps) \xrightarrow{\text{SU} \mathbf{b} \text{m} : \text{ST} \mathbf{n} :: \text{SU} \mathbf{b} \text{m} : \text{ST} \mathbf{n}} (u, v, ps[(\mathbf{b}, \mathbf{m}) \mapsto ps c])
 -Rules 6.4: Shuffles -
 POISON-FILL
                                                                                                                POISON-SPILL
  \frac{Pt = a' := \mathbf{stk}[1]_{>sc} \quad t \xrightarrow{\bullet : \text{LD} 1} v}{(s, t, \text{ps}) \xrightarrow{\varepsilon : \varepsilon : : \bullet : \text{LD} 1} (s, v, \text{ps})} \frac{Pt = \mathbf{stk}[1] := b'_{>sc} \quad t \xrightarrow{\bullet : \text{sT} 1} v}{(s, t, \text{ps}) \xrightarrow{\varepsilon : \varepsilon : : \bullet : \text{ST} 1} (s, v, \text{ps})} \frac{Pt = a' := b'_{>sc} \quad t \xrightarrow{\bullet : \bullet} v}{(s, t, \text{ps}) \xrightarrow{\varepsilon : \varepsilon : : \bullet : \bullet} (s, v, \text{ps})}
                                                                                            POISON-SHUFFLE-SLH
  POISON-SHUFFLE-SFENCE
    Pt = \text{sfence}_{sc} \quad t \xrightarrow{\bullet:\bullet} v \qquad PT = \text{slh } a'_{sc} \quad \Psi T = a' \quad T \xrightarrow{\bullet:\bullet} V \quad pv = |P| = 0 ? ps a : wp
                                                                                                                                      (S, T, P.ps) \xrightarrow{\mathcal{E}:\mathcal{E}::\bullet:\bullet} (S, V, P.ps[a \mapsto pv])
     (s, t, ps) \xrightarrow{\varepsilon:\varepsilon::\bullet:\bullet} (s, v, ps)
 -Rules 6.5: Speculation Sensitive ---
                             COISON-STEP (s, t, ps) \xrightarrow{\delta:\lambda::\gamma:\kappa} (u, v, ps') POISON-ROLLBACK |S| = |T| = |P| \ge 1 (S.s, T.t, P.ps) \xrightarrow{\delta:\lambda::\gamma:\kappa} (S.u, T.v, P.ps') (S.s, T.t, P.ps) \xrightarrow{RB:RB::RB:RB:RB} (S, T, P)
                    POISON-STEP
  HEALTHY-SPEC
                                                                                                                                                                                              POISON-SFENCE
     \frac{PS = \text{br} b_{\text{sct, scf}}}{PS = \text{br} b_{\text{sct, scf}}} \text{ ps } b = \text{h} \quad S \xrightarrow{\text{SP:BR} b} S.u \quad T \xrightarrow{\text{SP:BR} b} T.v \qquad PS = \text{sfence}_{\text{sc}} \quad S \xrightarrow{\bullet:\bullet} u \quad t \xrightarrow{\bullet:\bullet} v
                           (S, T, P.ps) \xrightarrow{SP:BRb::SP:BRb} (S.u, T.v, P.ps.ps)
                                                                                                                                                                                                       (s, t, ps) \xrightarrow{\bullet : \bullet : : \bullet : \bullet} (u, v, ps)
                                                          POISON-SLH
                                                             PS = \sinh a_{\operatorname{sc}} \quad S \xrightarrow{\bullet : \bullet} U \quad T \xrightarrow{\bullet : \bullet} V \quad \operatorname{pv} = |P| = 0 ? \operatorname{ps} a : h
                                                                                            (S, T, P.ps) \xrightarrow{\bullet : \bullet : : \bullet : : \bullet} (U, V, P.ps[a \mapsto pv])
```

Poison product. The poison product  $P::[P]^{\mathrm{ra}}$  tracks how poison types are updated when executing P and  $[P]^{\mathrm{ra}}$  side by side. We design the side by side execution so that transitions in  $[P]^{\mathrm{ra}}$  are replayed by P. We later want to craft a simulation from the product, so every transition of  $[P]^{\mathrm{ra}}$  must be included in  $P::[P]^{\mathrm{ra}}$ . However, when a transition of  $[P]^{\mathrm{ra}}$  can be replayed by multiple transitions in P, which happens when P performs an unsafe memory access, we choose a memory location for the access so that the transition poisons as few registers and memory locations as possible. The states of  $P::[P]^{\mathrm{ra}}$  take the shape (S,T,P) with  $S \in SState_P$ ,  $T \in SState_{[P]^{\mathrm{ra}}}$ ,  $P \in Ptype^*$ , and  $S \Psi^P T$  (which already implies |S| = |T| = |P|). There are two types of states: Instruction-matched states and shuffling states. Instruction-matched states satisfy  $S \Phi T$ . Shuffling states have T at a shuffle sequence after which the states would be instruction-matched again. That is, S = S'.s and T = T'.t, so that  $[P]^{\mathrm{ra}} t = sh_{,\mathrm{sc}}$  with  $\Phi s = \mathrm{sc}$ . Each type of state has transitions in  $P::[P]^{\mathrm{ra}}$ :

$$(S, T, P) \xrightarrow{Y:K::\delta:\lambda} (U, V, Q)$$
  $(S, T, P) \xrightarrow{\varepsilon:\varepsilon::\delta:\lambda} (S, V, Q)$ 

The first type of transition is enabled only for instruction-matched  $S \Phi T$ . It executes a transition simultaneously in both programs via  $S \xrightarrow{Y:K} U$  and  $T \xrightarrow{\delta:\lambda} V$ . The second type is enabled only for shuffling states. In that case, S stutters while T progresses through the shuffle sequence. In both cases, the transition needs to update the poison values depending on the transition rules taken in P and  $[P]^{\mathrm{ra}}$ . We present how  $P::[P]^{\mathrm{ra}}$  updates poison values in Rules 6.1 to 6.5. We mirror the approach in the semantics and provide the updates in two steps. In a first step, we define the updates for instruction-matched, speculation-free states in Rules 6.1 to 6.3. They represent the transitions for the speculation-free semantics (Rules 2.1). In the second step we lift the speculation-free updates with Rules 6.5 to define the transitions of the product on the speculative semantics (Rules 2.2). Rules 6.4 defines the transitions for shuffling states. The instruction-matched transitions for nop,  $a:=b\oplus c$  and br b can be found in the appendix. We explain the transitions in detail.

The transitions Poison-load-stkunsafe and Poison-store-stkunsafe are the source of poison values. They represent the situation from above, where we argued that values between P and  $[P]^{ra}$ can differ: In POISON-LOAD-STKUNSAFE a target program's speculating unsafe load (recognizable from the directive LU stk m) loads from a spilled register's stack location. The source program cannot perform that load and thus has to perform any other unsafe load. This leads to different loaded values and poisons the register loaded to. While we could choose the location of the unsafe access in P, the register is poisoned in either case, so we allow arbitrary unsafe loads in P. Similarly, in Poison-store-strunsafe a speculating unsafe store poisons the source program's register d by overwriting it on the stack. Again, the source program cannot replay that store as d is a register and needs to store somewhere else. In this case, we always let the source program store to a instead, because this reduces the number of poisoned memory locations for static analysis. The overwritten stack-allocated register d and the memory location in a are both poisoned. Notice how both rules require the source program to access memory unsafely, as well. This is because the register holding the offset address must be healthy, ps b = h, and thus has the same value in s and t. The address must be healthy, because a poisonous address could be unsafe to leak. A weakly poisoned address, on the other hand, would be 0 in  $[P]^{ra}$  which is always a safe access. This is because we chose slh a to wipe a register to 0. Any other constant would work the same way, but the number of rules in the product would increase.

Rules 6.2 and 6.3 for instruction-matched transitions just propagate already poisoned registers and memory locations. For example, Healthy-load-safe considers the case where the addressing register is healthy. That means that both source and target program load from the same memory location. We can recognize from the premise that the memory access is safe: Source and target program execute the load with the directive •, so they execute with load. The rule then propagates the poison value from the memory location loaded from to the register loaded to. As before,

the product transition can only be taken when the leaked content is not poisoned as that could represent an unsafe leakage. The case of a weakly poisoned address is separately handled by POISON-LOAD-SAFE and POISON-LOAD-UNSAFE. Rules 6.3 propagate poison values for stores.

Rules 6.5 provides the poison updates for instruction-matched states that execute speculation sensitive instructions. Rule Poison-step lifts the speculation-free transitions to the speculative setting. Healthy-spec forbids weakly poisoned branching conditions even though they are safe to leak. This is to avoid that P and  $[P]^{\rm ra}$  arrive at different program points (up to  $\Phi$ ). Poison-slh creates healthy values when the product is speculating, because source and target program both wipe the register to 0. If not speculating, the poison values are untouched.

Rules 6.4 provides the second type of transitions, for shuffling states, in  $P := [P]^{\text{ra}}$ . The speculation insensitive shuffle instructions are again brought to speculating states with Poison-Step. The insensitive shuffle instructions only move values and don't modify them. There is no need to update the poison type because the relocation is already included in  $\Psi$  with shuffle conformity. Thus, the poison type is untouched. Only Poison-shuffle-slh can be a new source of poisoned values. As discussed earlier, when speculatively executed, it wipes the register's value to 0, which makes it weakly poisoned. On speculation-free states, sfence and slh a' do not update the poison type.

The transitions of  $P :: [P]^{ra}$  are well-defined in the sense that executing a transition leads to a state that again satisfies the constraints of being a state (Lemma 4). Also, the speculation-free states never become poisoned, because POISON-LOAD-STKUNSAFE and POISON-STORE-STKUNSAFE cannot happen: P is memory safe under speculation-free semantics. Further, the only other rule to introduce poison values is POISON-SHUFFLE-SLH, but it introduces no poison value in speculation-free states.

LEMMA 4. Given (S, T, P) and a transition  $(S, T, P) \xrightarrow{Y:K::\delta:\lambda} (U, V, Q)$ , then  $U \Psi^Q V$ .

Lemma 5. Speculation-free states are never poisoned. I.e.  $(s, t, h) \xrightarrow{e:k::d:l} (u, v, ps)$  implies ps = h.

Example 10. Consider the execution of  $P :: [P]^{\rm ra}$  in Figure 3. It depicts the side by side execution from Example 9 in  $P :: [P]^{\rm ra}$ . Each state of  $P :: [P]^{\rm ra}$  is depicted as a small table listing the program counters of P and  $[P]^{\rm ra}$ , the values of the registers, and the values of relevant memory locations (we use t, f for Boolean typed values). The right column contains the poison type associated with the pair of states. The first transition simultaneously executes the instruction-matched assignment at 1 and a. The second transition executes the shuffle instruction that spills bytes to  ${\tt stk}[0]$  at b, while P waits. The third transition speculates. Notice that HEALTHY-SPEC only admits this transition because  ${\tt a}$  is  ${\tt h}$ . The next transition is the unsafe store to  ${\tt stk}$  in  $[P]^{\rm ra}$ , overwriting the spilled bytes. Notice that POISON-STORE-STKUNSAFE has changed the poison value for bytes to  ${\tt p}$ , because the memory location written to was  ${\tt P}$  e bytes = ( ${\tt stk}$ , 0). The rule also poisons the memory location where P writes to. The last transition is the shuffle instruction that relocates the poisoned bytes to  ${\tt a}$ . The product is now stuck: HEALTHY-SPEC is disabled because the poison value for bytes is  ${\tt p}$  and thus not safe to leak in  $[P]^{\rm ra}$ . Indeed,  ${\tt P}$  f bytes =  ${\tt a}$  holds the secret value 42.

# 6.4 Static Poison Analysis of $P := [P]^{ra}$

The poison product finds weaknesses: Whenever a poisoned register's value would be leaked in  $[P]^{ra}$ , the product cannot execute the transition. To find the program points where a poisoned register can be leaked, we design a static analysis that over-approximates the poison values any execution could produce. The analysis constructs, for each program point pair (pc, pc'), an approximate poison type ps. This poison type is approximate in that a statically h-typed register is always healthy in any execution that reaches (pc, pc') in a speculating state. Similarly, if it is statically typed wp it is

 $<sup>^{10}</sup>$ Branching on weakly poisoned registers could be supported but further complicates the product definition.



Fig. 3. An execution of  $P :: [P]^{ra}$  on Code 4. Updated values are highlighted. Dead registers are gray.

always weakly poisoned in a speculating state. However, if it is statically typed p it might also be healthy or weakly poisoned. Formally, the analysis constructs a function  $X : PC_{P :: [P]^{ra}} \to Ptype$ , where  $PC_{P :: [P]^{ra}}$  are the program points of  $P :: [P]^{ra}$ ,

$$PC_{P::[P]^{ra}} \triangleq \{(pc, pc') \mid \exists (S.s, T.t, P.ps). pc \equiv s \land pc' \equiv t\}$$
$$= \{(pc, pc'), (pc, r') \mid \Phi pc = pc' \land [P]^{ra} r' = sh_{pc'}\}.$$

The pairs (pc, pc') are instruction-matched program counters and (pc, r') are from shuffling states. We design our analysis as a forward flow analysis (Equation (fwd)). For that, we need to define a flow lattice L, transfer functions, and the initial flow value  $X_0$ . The flow analysis then yields X as a solution. The lattice is constructed on *Ptype*. We create an ordering h < p and wp < p on *Poison*. In order to arrive at a lattice, we further extend it by an artificial least element  $\bot \in Poison.$  We then lift the ordering point-wise to poison types for the flow lattice  $L = (Ptype, \leq)$ . The ordering is chosen with the intention that when ps  $\leq$  ps', then  $s \Psi^{ps} t$  implies  $s \Psi^{ps'} t$  (which would not be the case if we had set h < wp). The initial value is set to healthy,  $\chi_0 = h$ , because the initial states of P and  $[P]^{ra}$  are fully equal up to  $\Psi$ . The transfer functions  $F_{(pc,pc')}: Ptype \to Ptype$  need to approximate the poison types in  $P :: [P]^{ra}$ . Conceptually, they update the current poison type by simultaneously executing all updates that  $P := [P]^{ra}$  could do. For instruction-matched  $\Phi pc = pc'$ that means to look at the instruction i = P pc which is the same as  $[P]^{ra}$  pc' up to  $\Psi$ . Then, we poison all registers and memory locations that a rule for i from Rules 6.1 to 6.3 and 6.5 could poison. This means the transfer function is solely dependent on the instruction i = P pc, and we define it via  $F_{(DC,DC')} = F_i$  below. For a shuffling product state,  $P :: [P]^{ra}$  offers for each instruction  $i' = [P]^{ra}$  r' only one update which we find in Rules 6.4. We again define transfer solely dependent on i' via  $F_{(pc,r')} = G_{i'}$ . For a shuffling  $i' = \sinh a'$  we assume that the source register for a' is a, i.e.  $\Psi r' a = a'$ . We only present the interesting cases of  $F_i$  and  $G_{i'}$ . The initial poison type for (init, init') is healthy.

$$F_{a := a[b]_{sc}} \text{ ps} = \text{ps}[a \mapsto p]$$

$$F_{\text{slh } a_{sc}} \text{ ps} = \text{ps}[a \mapsto h]$$

$$G_{\text{slh } a', \text{sc}} \text{ ps} = \text{ps}[a \mapsto wp]$$

$$G_{\text{sfence}, \text{sc}} \text{ ps} = h$$

$$F_{a := b \oplus c, \text{sc}} \text{ ps} = \begin{cases} \text{ps}[a \mapsto h] & \text{ps } b = \text{ps } c = h \\ \text{ps}[a \mapsto p] & \text{otherwise} \end{cases}$$

$$F_{a[b] := c, \text{sc}} \text{ ps} = \text{ps}[Mem \mapsto ps \ c][Reg, a \mapsto p]$$

All transfer functions are monotonic. All but one transfer function are easily defined in order to approximate the rules of  $P::[P]^{ra}$ . The exception is for  $\mathbf{a}[b] := c$  instructions which we explain. The first substitution  $ps[Mem \stackrel{\sqcup}{\mapsto} ps c]$  sets all memory locations  $(\mathbf{b}, \mathbf{n})$  to  $ps(\mathbf{b}, \mathbf{n}) \sqcup ps c$ . This approximates Healthy-load-safe and Healthy-load-unsafe, because both P and P are to

 $<sup>^{11}</sup>$ This is a standard construction. In the remainder, we assume that transfer functions preserve  $\perp$ .

the same location which will have poison value ps c. All other locations maintain their poison value. To approximate this behavior statically, we take the join on the two poison values. The second substitution sets all of  $\mathbf{a}$  and all registers Reg to  $\mathbf{p}$ . This approximates Poison-store-stkunsafe: Reg needs to be poisoned because a store to the stack overwrites the contents of a register in the source program. At the same time, the rule overwrites  $\mathbf{a}$  in the source program P, which leads to  $\mathbf{a}$  being poisoned as well. Additionally, Poison-store-safe and Poison-store-unsafe write to  $\mathbf{a}$  only, meaning the poisoning of  $\mathbf{a}$  already approximates them as well.

In order to formally express how a solution X approximates the poison values of all reachable states in  $P :: [P]^{ra}$ , we introduce shorthand notations for sequences of poison types fully consisting of poison types from X. Fix a solution X to the flow equations. We define:

$$\begin{aligned} & \mathcal{X}_{s,t} = \mathsf{h} & \mathcal{X}_{S,s,T,t} = \mathcal{X}_{S,T}\mathcal{X}\left(s,t\right) \\ & S \ \Psi_{\scriptscriptstyle X} \ T \Leftrightarrow S \ \Psi^{\mathcal{X}_{S,T}} \ T & \Leftrightarrow S.s \ \Psi^{\mathcal{X}_{S,T},\mathsf{ps}} \ T.t & \Leftrightarrow S.s \ \Psi^{\mathcal{X}_{S,T},\mathsf{ps}} \ T.t & \Leftrightarrow S.s \ \mathcal{Y}_{\scriptscriptstyle X} \ T.t, \ \mathcal{Y}_{S,T},\mathsf{ps} \end{aligned}$$

The notation  $X_{S,T}$ , where S, T are speculating states, stands for a sequence of poison types of length  $|X_{S,T}| = |S| = |T|$ . Intuitively,  $X_{S,T}$  consists of the poison types of X, applied to the program counters of S and T at every level. The only exception is the lowest level in the speculating states. They are always set to h, because we know from Lemma 5 that those states are reachable speculation-free and can never have poisoned registers or memory locations. The notation  $S \Psi_X T$ ,  $S \Psi_X^{ps} T$ , and (S.s, T.t, ps) are shorthand notations to avoid a lot of repeating  $X_{S,T}$ .

LEMMA 6. Whenever 
$$(S, T, X_{S,T}) \xrightarrow{Y:K::\delta:\lambda} (U, V, Q)$$
 then  $Q \leq X_{U,V}$ .

# 6.5 Fixing Register Allocation

We can use our static analysis solution X to identify whether  $[P]^{ra}$  has weaknesses: If X guarantees that leakages are never poisonous, then  $[P]^{ra}$  has no register allocation induced weaknesses.

Definition 8. A register allocation  $(\Phi, \Psi)$  between P and  $[P]^{ra}$  is poison-typable, if there is a solution X to (fwd) which for every (pc, pc') with  $\Phi$  pc = pc' satisfies the additional constraints

$$\begin{split} \operatorname{ps} b &= \operatorname{wp} \vee \operatorname{ps} b = \operatorname{h} & if & P\operatorname{pc} &= \operatorname{a}[b]_{\operatorname{>sc}} \vee P\operatorname{pc} &= \operatorname{a}[b] \coloneqq c_{\operatorname{>sc}} \,, \\ \operatorname{ps} b &= \operatorname{h} & if & P\operatorname{pc} &= \operatorname{br} b_{\operatorname{>sc}_{\mathsf{t}},\operatorname{sc}_{\mathsf{f}}} \,. \end{split}$$

Our fix is applicable to any register allocation  $(\Phi, \Psi)$  between P and  $[P]^{ra}$ : We check if  $(\Phi, \Psi)$  is poison-typable. For that, we solve the flow analysis (e.g. [Kildall 1973]) to obtain a static poison assignment X and check the additional constraints. If it is not poison-typable, then an additional constraint is violated for some program point (pc, pc') and register b' of  $P: [P]^{ra}$ . We insert an sfence or slh b' instruction into  $[P]^{ra}$  at the end of the shuffle sequence right before pc' and obtain a new register allocation where that additional constraint is now satisfied. We then repeat the process until we obtain a poison-typable register allocation.

## 6.6 Poison-typable Register Allocation is SNiP

We now prove that making register allocation poison-typable already makes it SNiP. With our proof method from Theorem 2, this reduces to crafting a snippy simulation. Again, the crafted simulation is parametric, so that it works for all poison-typable register allocations.

THEOREM 11. If  $(\Phi, \Psi)$  is a poison-typeable register allocation between P and  $[P]^{ra}$ , then there exists a snippy simulation  $(\prec, \lhd)$  between  $[P]^{ra}$  and P.

For the remainder of the section, fix a poison-typable register allocation  $(\Psi, \Phi)$  between P and  $[P]^{ra}$ , and the static poison assignment X. Further, let init be the entry point for P and init' for  $[P]^{ra}$ .

<sup>&</sup>lt;sup>12</sup>We could be more precise and poison only those registers spilled at the current program point.



Fig. 4. The shape of  $\Phi$ -intervals. Teal and red paths form separate intervals.

Defining  $(\prec, \lhd)$ . We define  $\prec \subseteq SState_{[P]^{ra}} \times SState_P$  similar to the simulation for register allocation without speculative semantics: Source state S and target state T are instruction-matched and coincide in values up to relocation by  $\Psi$ . The difference is that the states do not need to coincide on registers and memory locations poisoned by X:

$$T \prec S$$
 if and only if  $S \Phi T \wedge S \Psi_X T$ .

For the directive transformation  $\triangleleft_{T \prec S}$ , we rely on  $P :: [P]^{\text{ra}}$ 's transitions. We say that a transition sequence  $(S, T, ps) \xrightarrow{e:K::d:!}^* (U, V, ps')$  is a  $\Phi$ -interval if  $S \oplus T$ ,  $U \oplus V$ , and no intermediary state pairs are instruction matched. The intent is that  $\Phi$ -intervals are precisely the simulation intervals once we have proven that our defined relation is a simulation. Figure 4 depicts the shape of  $\Phi$ -intervals: The product first executes the instruction-matched instruction on both S and T. Then, the target program can perform any number of shuffle steps, until either the shuffle sequence is fully executed (the pair of red paths), or a rollback happened before that (any pair of teal paths). In that case,  $P :: [P]^{\text{ra}}$  rolls back on U as well. We use the  $\Phi$ -intervals as directive transformations,

$$\triangleleft_{T \prec S} \triangleq \{(e,d) \mid (S,T,ps) \xrightarrow{e:k::d:l} (U,V,ps') \text{ is a } \Phi\text{-interval}\}.$$

LEMMA 7.  $(\prec, \lhd)$  is a snippy simulation between  $[P]^{ra}$  and P.

## 7 Related Work

We already discussed the closely related work in the context of compiler correctness. Here, we give a broader picture and elaborate on methods for proving non-interference for single programs. Note the difference: when reasoning about compiler passes, we reason over all programs. For a broader overview, we defer the reader to a recent survey [Cauligi, Disselkoen, Moghimi, et al. 2022].

Speculation Sources. Since its discovery in 2018, Spectre attacks have been rediscovered in multiple variants. The main difference between the variants lies in the hardware feature that is trained in order to trigger a misspeculation. We call the respective feature the source of speculation. The first version of the attack trains the Prediction History Table (PHT) of the processor, in order to inflict a mispredicted branching speculation [Kocher et al. 2019]. Other variants train the Branch Target Buffer (BTB) to mispredict indirect branching instructions [Kocher et al. 2019], the Return Stack Buffer (RSB) to mispredict return points [Koruyeh et al. 2018], both of which highjack the speculative control flow to execute leaking gadgets speculatively. They can be mitigated in software with a retpoline gadget [Turner 2018]. The Speculative Store Bypass (SSB) mechanism, also called Store-To-Load Forwarding (STL), reads from memory even though pending stores have unresolved addresses [Horn 2018], and the Predictive Store Forwarding (PSF) mechanism forwards pending stores to loads with unresolved address [Guanciale et al. 2020]. These mechanisms speculatively load values that either should have been overwritten in the meantime, or should never arrive in memory at the loaded address, creating further potential for unwanted information-flow. Memory speculation sources can be disabled in hardware with mediocre performance penalty. The Meltdown attack introduces speculation through an out-of-order read from elevated-permission memory regions, racing against the MMU to detect the violation before the read memory can be leaked [Lipp

Table 3. Tools that check a program against speculative non-interference: Pitchfork [Cauligi, Disselkoen, Gleissenthall, et al. 2020], Spectector [Guarnieri, Köpf, Morales, et al. 2020], RelSE [Daniel et al. 2021], Blade [Vassena et al. 2021], Jasmin SCT [Barthe, Cauligi, et al. 2021], Typing V1 [Shivakumar et al. 2023].

|               | Pitchfork | Spectector | RelSE    | Blade | Jasmin SCT | Typing V1 |
|---------------|-----------|------------|----------|-------|------------|-----------|
| Source        | PHT, SSB  | PHT        | PHT, SSB | PHT   | PHT        | PHT       |
| Property      | TS        | NI         | NI       | TS    | TS         | TS        |
| Method        | EX        | SE         | SE       | SA    | SA         | SA        |
| Speculation   | SW        | SW         | SW / SB  | US    | US         | US        |
| Directives    | Y         | Y (Oracle) | N        | Y     | Y          | Y         |
| Memory Safety | U         | U          | U        | U     | SS         | S         |
| Bug / Proof   | Bug       | Bug        | Bug      | Proof | Proof      | Proof     |

et al. 2018]. While the speculation sources are varying, all attack variants leak the secrets through the side-channels covered by the constant-time leakage model [Guarnieri, Köpf, Reineke, et al. 2021]. In this paper, we consider only the PHT speculation source. We believe that our notion of simulation also holds for other speculation sources, but we are less sure about what compiler passes satisfy SNiP when they are considered.

*Properties.* The constant time programming guideline requires no influence of sensitive data towards the leakage observable by the attacker. Formally, *non-interference* [Goguen and Meseguer 1982] (Definition 1) expresses this property. Non-interference is a hyper-property [Clarkson and Schneider 2010]: It requires to reason about two executions of the program. Hyper-properties tend to be harder to verify than single trace properties due to synchronization issues with the traces compared. Non-interference for side-channel leakage evades these issues: The constant time leakage model exposes the program counter to the attacker. Thus, when two traces from attacker-indistinguishable initial states do not coincide in their control flow, the program can immediately be rejected as insecure. Such a high degree of synchronization allows for a sound approximation we call taint safety [A. C. Myers 1999; Sabelfeld and A. Myers 2003].

Tools. Existing tools check single implementations for non-interference. Our work on compiler transformations complements this line of work. It is now possible to check source programs for noninterference, and rely on the compiler that guarantees to preserve non-interference to the executed program for any source program. The tools deal with two main challenges: Non-determinism from speculation and the two executions required for non-interference. Table 3 presents a list of tools and classifies their approaches. The **Source** lists the considered speculation sources. **Property** is the formal property checked: NI is non-interference under speculative semantics. TS is taint safety. Method is either SA for static analysis such as flow- or type-systems, SE for symbolic execution, or EX for state space exploration. **Speculation** lists how the tool models and copes with speculation based non-determinism. SW means that the semantics have a bounded speculation window, i.e. a bound to the number of speculatively executed instructions. SB means that the semantics have a bounded store buffer that limits the number of speculatively executed store instructions. US means unbounded speculation, i.e. the semantics speculate arbitrarily long. Mem Safety describes the memory model and memory requirements on the source program. U stands for unstructured memory, S for structured memory and memory safety under speculation-free semantics, and SS is memory safety even under speculative semantics.

Jasmin SCT (Table 3) comes along with the Jasmin compiler (Table 1), that is proven to preserve Ni under speculation-free semantics. In [Barthe, Cauligi, et al. 2021], the authors suggest an additional

requirement to cope with speculative semantics: Source programs need to be memory safe even when executed under speculative semantics (SS). However, there is no proof that the compiler preserves non-interference under this requirement. SS is also a performance breaking requirement: They report that SS-implementations are ~20% slower than the insecure reference implementations; in contrast to ~1% overhead reported for recently protected implementations without SS.

Similarities and differences to a simultaneously developed proof technique. Another proof method for preserving speculative side-channel security through compilers has been developed independently of our work [Arranz Olmos et al. 2025]. There are subtle differences in the definition of semantics, security property, and simulation: First, their semantics comes without rollbacks. The idea is that a difference in leakage can always be obtained by an execution that contains no rollback. This result has been obtained previously [Barthe, Cauligi, et al. 2021]. Second, their security property, SCT, is the same as SNi from this work. However, the authors employ a slightly modified semantics which allows them to phrase the property differently. Finally, our simulation constraint is a constant-time cube (Figure 2). The constraint formulated by Arranz Olmos et al. [2025] instead requires the existence of two functions: One back-translates directives similar to how our simulation finds a sequence of source directives to replay the target directives. The other forward-translates leakage from source leakage to target leakage. The existence of both functions already guarantees that our constant-time cube is satisfied. The bigger difference is in the application of our proof methods. Our work spots a weakness in register allocation, and we develop a static analysis to protect against security-threatening spills. Arranz Olmos et al. [2025] target the Jasmin compiler with their work. Jasmin's source language already requires the programmer to tag variables as register or stack variable. Therefore, the Jasmin compiler performs no spilling, avoiding the weakness by demanding the programmer to appropriately choose register-allocated variables. Their work instead focuses on proving nine other passes secure by extending preservation proofs from leakage semantics to speculative semantics.

## 8 Conclusion

We have developed a method for proving that compiler transformations preserve non-interference from source to target programs *under speculative semantics*. When experimenting with our method, we found that it worked well on simple transformations like dead code elimination, but we had trouble applying it to register allocation. As it turned out, the fault was not on our side but register allocation is actually insecure. Our method led us to discover a new vulnerability introduced by register allocation. We have confirmed the existence of this vulnerability in the mainstream compiler LLVM on code from libsodium, a modern cryptographic library. Interestingly, our proof method also guided us towards a fix: We have presented a new static analysis to identify weaknesses introduced by register allocation, and an automated procedure to fix them. With these additions, we have been able to prove that register allocation preserves non-interference.

As future work, we would like to integrate our proof method with certified compilers, investigate transformations that we left out so far, and consider further sources of speculation in the semantics.

## References

John McCarthy and James Painter. 1967. "Correctness of a Compiler for Arithmetic Expressions." *Proc. Sympos. Appl. Math., Vol. XIX.* Amer. Math. Soc., Providence, RI, 33–41.

Robin Milner. 1971. "An Algebraic Definition of Simulation between Programs." *IJCAI.* William Kaufmann, 481–489. Gary A. Kildall. 1973. "A Unified Approach to Global Program Optimization." *POPL.* ACM, 194–206. https://doi.org/10.1145/512927.512945.

 $<sup>^{13}\</sup>mathrm{To}$  be appearing in the same conference.

- Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. 1981. "Register Allocation via Coloring." *Computer Languages*, 47–57. https://doi.org/10.1016/0096-0551(81)90048-5.
- Joseph A. Goguen and José Meseguer. 1982. "Security Policies and Security Models." S&P, 11–11. https://doi.org/10.1109/SP.1982.10014.
- Omri Traub, Glenn Holloway, and Michael D. Smith. 1998. "Quality and Speed in Linear-Scan Register Allocation." *PLDI*. ACM, 142–151. https://doi.org/10.1145/277650.277714.
- Andrew C. Myers. 1999. "JFlow: Practical Mostly-Static Information Flow Control." POPL. ACM, 228–241. https://doi.org/10.1145/292540.292561.
- Massimiliano Poletto and Vivek Sarkar. 1999. "Linear Scan Register Allocation." ACM Trans. Program. Lang. Syst., 895–913. https://doi.org/10.1145/330249.330250.
- A. Sabelfeld and A.C. Myers. 2003. "Language-Based Information-Flow Security." *IEEE Journal on Selected Areas in Communications*, 5–19. https://doi.org/10.1109/JSAC.2002.806121.
- David Brumley and Dan Boneh. 2005. "Remote Timing Attacks Are Practical." *Computer Networks*, 701–716. https://doi.org/10.1016/j.comnet.2005.01.010.
- David Molnar, Matt Piotrowski, David Schultz, and David Wagner. 2006. "The Program Counter Security Model: Automatic Detection and Removal of Control-Flow Side Channel Attacks." *ICISC*. Springer, 156–168. https://doi.org/10.1007/1173472 7 14.
- Xavier Leroy. 2009. "A Formally Verified Compiler Back-end." Journal of Automated Reasoning, 363–446. https://doi.org/10.1007/s10817-009-9155-4.
- Michael R. Clarkson and Fred B. Schneider. 2010. "Hyperproperties." *Journal of Computer Security*, 1157–1210. https://doi.org/10.3233/JCS-2009-0393.
- Nadhem J. Al Fardan and Kenneth G. Paterson. 2013. "Lucky Thirteen: Breaking the TLS and DTLS Record Protocols." S&P, 526–540. https://doi.org/10.1109/SP.2013.42.
- Gilles Barthe, Gustavo Betarte, Juan Campo, Carlos Luna, and David Pichardie. 2014. "System-Level Non-interference for Constant-time Cryptography." CCS. ACM, 1267–1279. https://doi.org/10.1145/2660267.2660283.
- Yuval Yarom and Katrina Falkner. 2014. "FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack." USENIX, 719–732.
- Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. 2015. "Last-Level Cache Side-Channel Attacks Are Practical." S&P, 605–622. https://doi.org/10.1109/SP.2015.43.
- David Costanzo, Zhong Shao, and Ronghui Gu. 2016. "End-to-End Verification of Information-Flow Security for C and Assembly Programs." *PLDI*. ACM, 648–664. https://doi.org/10.1145/2908080.2908100.
- Gilles Barthe, Benjamin Grégoire, and Vincent Laporte. 2018. "Secure Compilation of Side-Channel Countermeasures: The Case of Cryptographic "Constant-Time"." CSF, 328–343. https://doi.org/10.1109/CSF.2018.00031.
- Jann Horn. 2018. Speculative Execution, Variant 4: Speculative Store Bypass. Retrieved July 11, 2024 from https://bugs.chromi.um.org/p/project-zero/issues/detail?id=1528.
- Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song, and Nael Abu-Ghazaleh. 2018. "Spectre Returns! Speculation Attacks Using the Return Stack Buffer." *USENIX*.
- Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. 2018. "Meltdown: Reading Kernel Memory from User Space." *USENIX*, 973–990.
- Laurent Simon, David Chisnall, and Ross Anderson. 2018. "What You Get Is What You C: Controlling Side Effects in Mainstream C Compilers." *EuroS&P*, 1–15. https://doi.org/10.1109/EuroSP.2018.00009.
- Paul Turner. 2018. Retpoline: A Software Construct for Preventing Branch-Target-Injection. Retrieved Oct. 24, 2024 from https://support.google.com/faqs/answer/7625886.
- Gilles Barthe, Sandrine Blazy, Benjamin Grégoire, Rémi Hutin, Vincent Laporte, David Pichardie, and Alix Trieu. 2019. "Formal Verification of a Constant-Time Preserving C Compiler." *POPL*, 7:1–7:30. https://doi.org/10.1145/3371075.
- Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, and Daniel Gruss. 2019. "A Systematic Evaluation of Transient Execution Attacks and Defenses." *USENIX*, 249–266.
- Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. 2019. "Spectre Attacks: Exploiting Speculative Execution." S&P, 1–19. https://doi.org/10.1109/SP.2019.00002.
- Sunjay Cauligi, Craig Disselkoen, Klaus v. Gleissenthall, Dean Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. 2020. "Constant-Time Foundations for the New Spectre Era." PLDI. ACM, 913–926. https://doi.org/10.1145/3385412.3385970.
- Roberto Guanciale, Musard Balliu, and Mads Dam. 2020. "InSpectre: Breaking and Fixing Microarchitectural Vulnerabilities by Formal Analysis." CCS. ACM, 1853–1869. https://doi.org/10.1145/3372297.3417246.

- Marco Guarnieri, Boris Köpf, José F. Morales, Jan Reineke, and Andrés Sánchez. 2020. "Spectector: Principled Detection of Speculative Information Flows." S&P, 1–19. https://doi.org/10.1109/SP40000.2020.00011.
- Manuel Barbosa, Gilles Barthe, Karthik Bhargavan, Bruno Blanchet, Cas Cremers, Kevin Liao, and Bryan Parno. 2021. "SoK: Computer-Aided Cryptography." *S&P*, 777–795. https://doi.org/10.1109/SP40001.2021.00008.
- Gilles Barthe, Sandrine Blazy, Rémi Hutin, and David Pichardie. 2021. "Secure Compilation of Constant-Resource Programs." CSF, 1–12. https://doi.org/10.1109/CSF51468.2021.00020.
- Gilles Barthe, Sunjay Cauligi, Benjamin Grégoire, Adrien Koutsos, Kevin Liao, Tiago Oliveira, Swarn Priya, Tamara Rezk, and Peter Schwabe. 2021. "High-Assurance Cryptography in the Spectre Era." S&P, 1884–1901. https://doi.org/10.1109/SP40001.2021.00046.
- Gilles Barthe, Benjamin Grégoire, Vincent Laporte, and Swarn Priya. 2021. "Structured Leakage and Applications to Cryptographic Constant-Time and Cost." CCS. ACM, 462–476. https://doi.org/10.1145/3460120.3484761.
- Lesly-Ann Daniel, Sébastien Bardin, and Tamara Rezk. 2021. "Hunting the Haunter Efficient Relational Symbolic Execution for Spectre with Haunted RelSE." NDSS. Internet Society. https://doi.org/10.14722/ndss.2021.24286.
- Marco Guarnieri, Boris Köpf, Jan Reineke, and Pepe Vila. 2021. "Hardware-Software Contracts for Secure Speculation." S&P, 1868–1883. https://doi.org/10.1109/SP40001.2021.00036.
- Marco Patrignani and Marco Guarnieri. 2021. "Exorcising Spectres with Secure Compilers." CCS. ACM, 445–461. https://doi.org/10.1145/3460120.3484534.
- Marco Vassena, Craig Disselkoen, Klaus von Gleissenthall, Sunjay Cauligi, Rami Gökhan Kıcı, Ranjit Jhala, Dean Tullsen, and Deian Stefan. 2021. "Automatically Eliminating Speculative Leaks from Cryptographic Code with Blade." POPL, 49:1–49:30. https://doi.org/10.1145/3434330.
- Son Tuan Vu, Albert Cohen, Arnaud De Grandmaison, Christophe Guillon, and Karine Heydemann. 2021. "Reconciling Optimization with Secure Compilation." *OOPSLA*, 142:1–142:30. https://doi.org/10.1145/3485519.
- Sunjay Cauligi, Craig Disselkoen, Daniel Moghimi, Gilles Barthe, and Deian Stefan. 2022. "SoK: Practical Foundations for Software Spectre Defenses." S&P, 666–680. https://doi.org/10.1109/SP46214.2022.9833707.
- Florent Bouchez Tichadou and Fabrice Rastello. 2022. "Register Allocation." SSA-based Compiler Design. Springer International Publishing, 303–328. https://doi.org/10.1007/978-3-030-80515-9\_22.
- Minki Cho, Youngju Song, Dongjae Lee, Lennard Gäher, and Derek Dreyer. 2023. "Stuttering for Free." OOPSLA, 281:1677–281:1704. https://doi.org/10.1145/3622857.
- Basavesh Ammanaghatta Shivakumar, Gilles Barthe, Benjamin Gregoire, Vincent Laporte, Tiago Oliveira, Swarn Priya, Peter Schwabe, and Lucas Tabary-Maujean. 2023. "Typing High-Speed Cryptography against Spectre V1." S&P, 1094–1111. https://doi.org/10.1109/SP46215.2023.10179418.
- Zhiyuan Zhang, Gilles Barthe, Chitchanok Chuengsatiansup, Peter Schwabe, and Yuval Yarom. 2023. "Ultimate SLH: Taking Speculative Load Hardening to the Next Level." *USENIX*, 7125–7142.
- Santiago Arranz Olmos, Gilles Barthe, Lionel Blatter, Benjamin Grégoire, and Vincent Laporte. 2025. "Preservation of Speculative Constant-time by Compilation." *POPL*, To appear. https://doi.org/10.1145/3704880.
- $\label{lem:chandler Carruth. 2024. Speculative Load Hardening-LLVM. Retrieved July 12, 2024 from \ https://llvm.org/docs/Speculative LoadHardening.html.$

## A Missing proofs from Section 2

Proof of Lemma 1. Let  $S_1 = S_1'.(\text{pc}, \rho_1, \mu_1)$ ,  $S_2 = S_2'.(\text{pc}, \rho_2, \mu_2)$ , We do the proof by induction on the first transition  $S_1 \xrightarrow{\delta:\lambda} T_1$  and  $S_2 \xrightarrow{\delta:\lambda} T_1$ . We do case distinction.

- ▶  $\delta = \text{RB} = \lambda$ , then  $|S_1| = |S_2| > 1$ . Further,  $T_1 = S_1'$  and  $T_2 = S_2'$ . Thus,  $T_1 \equiv T_2$  follows immediately from  $S_1 \equiv S_2$
- ▶  $\delta = \text{sp and } \lambda = \text{Br b}$  Then,  $S_1 \xrightarrow{\text{SP:BR b}} S_1.(\text{sc}_{\neg b}, \rho_1, \mu_1)$  and  $S_2 \xrightarrow{\text{SP:BR b}} S_2.(\text{sc}_{\neg b}, \rho_2, \mu_2)$ . Clearly,  $T_2 = S_2.(\text{sc}_{\neg b}, \rho_2, \mu_2) \equiv S_1.(\text{sc}_{\neg b}, \rho_1, \mu_1)$ . The case of  $\delta = \text{Br and } \lambda = \text{Br b is similar}$ .
- ▶ All other cases Because P pc has only a single successor sc, and only modifies the executing state, it is clear from  $S'_1 \equiv S'_2$  and both executing states being in sc, that  $T_1 \equiv T_2$ .

# **B** Missing Proofs from Section 4

## Lemma 3

PROOF. The  $\supseteq$  direction for terminating behavior is immediate:  $\cdots_t \subseteq \cdots \nearrow_t$ . For diverging behavior, consider any (d:l) with  $T \xrightarrow{d:l}_t \infty$ . By definition, there is  $T \xrightarrow{d:l}_t V \xrightarrow{e:k}_t \infty$ . Notably,  $\xrightarrow{d:l}_t$  cannot do stuttering transitions, thus there are transitions  $T \xrightarrow{\delta_0:\lambda_0} \cdots \xrightarrow{\delta_n:\lambda_n} V$  and  $\lambda_0 \cdots \lambda_n = l, \delta_0 \cdots \delta_n = d$ . Further, if there is  $T \xrightarrow{d:l}_t V$  then also  $(S,T) \xrightarrow{e:k}_{d:l}_t (U,V)$ . Thus,  $S \xrightarrow{e:k}_s U$  and  $V \prec U$ . By coinduction,  $V \xrightarrow{e:k}_s \infty$ , which completes this direction.

For  $\subseteq$ , be given the proof that justifies  $T \prec S$ . Show that when  $T \xrightarrow{d:l} \infty (T \xrightarrow{d:l} X, X \text{ final})$ , then (either T = X are final, or) there is  $T \xrightarrow{f:m}_{t} V \prec U$ , with  $V \xrightarrow{e:k}_{t} \infty (V \xrightarrow{e:k}_{t} X)$  so that l = m.k and d = f.e (l = m.k and d = f.e).

Then, we utilize this to first prove the slightly different statement: For every  $T \prec S$  with  $T \xrightarrow{d:l} \infty$   $(T \xrightarrow{d:l} X, X \text{ final})$ , (either T = X is final, or) there is  $T \xrightarrow{f:m}_{\mathsf{t}} V \prec U$ , with  $V \xrightarrow{e:k} \infty (V \xrightarrow{e:k} X)$  so that l = m.k and d = f.e (l = m.k and d = f.e). Towards contradiction, assume it is not the case. Then, the previous fact yields us an infinite chain of nodes in the proof tree, a contradiction to the well-foundedness of the proof: First, due to  $T \xrightarrow{e:e}_{\mathsf{t}} T, ( \xrightarrow{\mathsf{t}}_{\mathsf{t}} \mathsf{d}_{T \prec S} \vdash T \prec_t S : \varepsilon)$  is part of the proof. And when by induction the transitions  $T \xrightarrow{f:m}_{\delta n} V \xrightarrow{e:k}_{\mathsf{t}} V \xrightarrow{e:k}_{\mathsf{t}} \infty, f = \delta_0, \dots, \delta_n$  imply that the proof contains  $( \xrightarrow{\mathsf{t}}_{\mathsf{t}} \mathsf{d}_{T \prec S} \vdash T \prec_t S : \varepsilon) \Leftarrow \dots \Leftarrow ( \xrightarrow{\mathsf{t}}_{\mathsf{t}} \mathsf{d}_{T \prec S} \vdash V' \prec_t S : f)$ , then it cannot be proven by rule DIRECT-TF as that would create  $C \xrightarrow{\mathsf{t}}_{\mathsf{t}} \mathsf{d}_{\mathsf{t}} \mathsf{d}_{\mathsf{t}}$ 

Finally, we derive the result by coinduction, which yields  $V \xrightarrow{e:k}_{t} \infty$ . With  $T \xrightarrow{f:m}_{t} V$ , we compose  $T \xrightarrow{f.e:m.k}_{t} \infty$ .

## C Missing proofs from section 5 and Transfer for Liveness Analysis

We define the transfer functions for Liveness analysis on instructions. Write  $F_i$  for  $F_{pc}$  with P pc = i.

$$F_{a := \mathbf{a}[x], sc} \ l = \begin{cases} l & a \notin l \\ (l \setminus \{a\}) \cup \{(\mathbf{a}, \mathbf{n})\} & x = \mathbf{n} \\ (l \setminus \{a\}) \cup Mem) & x = b \end{cases}$$

$$F_{a[a] := x, sc} \ l = \begin{cases} l & x = \mathbf{n} \in |\mathbf{a}|, (\mathbf{a}, \mathbf{n}) \notin l \\ (l \setminus \{(\mathbf{a}, \mathbf{n})\}) & x = \mathbf{n} \in |\mathbf{a}|, (\mathbf{a}, \mathbf{n}) \in lm \end{cases}$$

$$F_{a[a] := x, sc} \ l = \begin{cases} (l \setminus \{a\}) \cup \{b, c\} & a \in l \\ l & a \notin lr \end{cases}$$

$$F_{bra, sct, scf} \ l = l \cup \{a\}$$

$$F_{ret} \ l = X_0 = Mem$$

## Remaining cases for Theorem 7

- ▶  $\langle a := b \oplus c_{,sc}, \mathsf{nop}_{,sc} \rangle$  For the first part, we have  $t_1 \xrightarrow{\bullet : \bullet} v_1$  and need to show  $s_1 \xrightarrow{\bullet : \bullet} u_1$  and  $v_1 \prec u_1$ . The former is immediate from semantics. For the latter, we know that  $a \notin X$  pc because the instruction was replaced by nop. The transfer of X pc along i is then  $F_{pc}(X \text{ pc}) = X$  pc because an assignment to a non-live register does not make any registers live. Together with (bwd), we get  $F_{pc}(X \text{ pc}) = X \text{ pc} \supseteq F_{sc}(X \text{ sc})$ . Because registers and memory of  $t_1$  and  $v_1$  are equal and  $s_1$  and  $u_1$  only differ on a,  $t_1 \prec s_1$  implies  $v_1 \prec u_1$  because  $a \notin X$  pc  $\supseteq F_{sc}(X \text{ sc})$ . For the second part, further assume  $s_2 \xrightarrow{\bullet : \bullet} u_2$ . We need to show that  $t_2 \xrightarrow{\bullet : \bullet} v_2$  and  $v_2 \prec u_2$ . The former is again immediate from semantics. For the latter, we have the same arguments as for the first part: Memory and register contents of  $t_2$  and  $v_2$  are equal and  $s_2$  and  $u_2$  only differ in a.
- ▶  $\mathbf{a}[b] \coloneqq c_{\mathsf{ssc}}, \mathbf{a}[b] \coloneqq c_{\mathsf{ssc}}$  There are two subcases either safe or unsafe store. For safe store we get  $t_1 \xrightarrow{\bullet : \mathtt{ST} \, \mathsf{n}} v_1$  and since  $b \in F_{\mathsf{pc}}(X \, \mathsf{pc}), \, \rho_{t_1} \, b = \rho_{s_1} \, b$ , so  $s_1 \xrightarrow{\bullet : \mathtt{ST} \, \mathsf{n}} u_1$ . Also,  $F_{\mathsf{pc}}(X \, \mathsf{pc}) \cup \{(\mathbf{a}, \mathsf{n})\} \ge X \, \mathsf{pc} \ge F_{\mathsf{sc}}(X \, \mathsf{sc})$  and  $\mu_{t_1} = \mu_{v_1}$  and  $\mu_{s_1} = \mu_{u_1}$  on all slots except  $(\mathbf{a}, \mathsf{n})$ . But due to  $c \in F_{\mathsf{pc}}(X \, \mathsf{pc})$ , we also have  $\mu_{t_1} \, \mathbf{a} \, \mathsf{n} = \mu_{v_1} \, \mathbf{a} \, \mathsf{n}$ , so  $v_1 \prec u_1$ . Further  $t_2 \xrightarrow{\bullet : \mathtt{ST} \, \mathsf{n}} v_2$  and  $\bullet \triangleleft_{t_2 \prec s_2} \, \bullet$  are a given.

For unsafe store we have an analogue proof, except both directives are su **b** m.

- ▶  $\mathbf{a}[b] := c_{\mathsf{ssc}}, \mathsf{nop}_{\mathsf{ssc}}$  Analogue to the previous case, except  $\kappa = \bullet$  and the argument for  $\mu_{t_1} \mathbf{a} \mathbf{n} = \mu_{v_1} \mathbf{a} \mathbf{n}$  does not hold anymore. However, since  $(\mathbf{a}, \mathbf{n}) \notin \mathcal{X} \mathsf{pc} \geq F_{\mathsf{sc}}(\mathcal{X} \mathsf{sc})$ , we don't need it for  $v_1 \prec u_1$ .
- ▶  $a := \mathbf{a}[b]_{sc}$ ,  $a := \mathbf{a}[b]_{sc}$  and  $a := \mathbf{a}[b]_{sc}$ , nop,  $a := \mathbf{a}[b]_{sc}$ . The argumentation is analogue to the previous two cases.
- ▶ nop<sub>>sc</sub>, nop<sub>>sc</sub> We get  $t_1 \xrightarrow{\bullet:\bullet} v_1$  and  $s_1 \xrightarrow{\bullet:\bullet} u_1$ . Also,  $F_{pc}(X pc) = X pc \ge F_{sc}(X sc)$  and  $\rho_{t_1} = \rho_{v_1}$  and  $\rho_{s_1} = \rho_{u_1}$ , so  $v_1 \prec u_1$ . Further  $t_2 \xrightarrow{\bullet:\bullet} v_2$  and  $\bullet \triangleleft_{t_2 \prec s_2} \bullet$  are a given.
- ▶  $a := b \oplus c_{ssc}$ ,  $a := b \oplus c_{ssc}$  We get  $t_1 \xrightarrow{\bullet : \bullet} v_1$  and  $s_1 \xrightarrow{\bullet : \bullet} u_1$ . Also,  $F_{pc}(X pc) \cup \{a\} \ge X pc \ge F_{sc}(X sc)$  and  $\rho_{t_1} = \rho_{v_1}$  and  $\rho_{s_1} = \rho_{u_1}$  on all registers except a. But due to  $b, c \in F_{pc}(X pc)$ , we also have  $\rho_{t_1} a = \rho_{v_1} a$ , so  $v_1 \prec u_1$ . Further  $t_2 \xrightarrow{\bullet : \bullet} v_2$  and  $\bullet \prec_{t_2 \prec s_2} \bullet$  are a given.
- ▶ ret No transition can be made by  $t_1$ , the case does not exist.
- ▶ br  $a_{\mathsf{ssc_t},\mathsf{sc_f}}$ , br  $a_{\mathsf{ssc_t},\mathsf{sc_f}}$  We get  $T_1.t_1 \xrightarrow{\mathsf{sp} \neg \mathsf{b:BR} \neg \mathsf{b}} T_1.v_{1,\mathsf{b}}.v_{1,\neg \mathsf{b}} = V_1$  where  $\mathsf{b} = (\rho_{t_1} \ a = 0)$  and by  $a \in F_{\mathsf{pc}} \ (X \ \mathsf{pc}), \ \rho_{t_1} \ a = \rho_{s_1} \ a \ \mathsf{so} \ S_1.s_1 \xrightarrow{\mathsf{sp} \neg \mathsf{b:BR} \rightarrow \mathsf{b}} S_1.u_{1,\mathsf{b}}.u_{1,\neg \mathsf{b}} = U_1$ . Further,  $F_{\mathsf{pc}} \ (X \ \mathsf{pc}) \ge X \ \mathsf{pc} \ge F_{\mathsf{sc_b}} \ (X \ \mathsf{sc_b})$ . Thus,  $v_{1,\mathsf{b}} \prec u_{1,\mathsf{b}}$  and  $v_{1,\neg \mathsf{b}} \prec u_{1,\neg \mathsf{b}}$ . Further, if  $S_2 \xrightarrow{\mathsf{sp} \neg \mathsf{b:BR} \neg \mathsf{b}} U_2$ , by same arguments, we have  $T_2 \xrightarrow{\mathsf{sp} \neg \mathsf{b:BR} \neg \mathsf{b}} V_2$  and  $\mathsf{sp} \neg \mathsf{b} \blacktriangleleft_{t_2 \prec s_2} \ \mathsf{sp} \neg \mathsf{b}$ .

## D Additional Material and Missing proofs for Section 6

RULES D.1: SHUFFLE SEMANTICS

MOVE

$$P \text{ pc} = a := b_{>sc}$$

$$(pc, \rho, \mu) \xrightarrow{\bullet:\bullet} (sc, \rho[a \mapsto \rho b], \mu)$$

$$FILL$$

$$(pc, \rho, \mu) \xrightarrow{\bullet:\bullet} (sc, \rho[a \mapsto \mu b], \mu)$$

$$P \text{ pc} = a := stk[l]_{>sc}$$

$$(pc, \rho, \mu) \xrightarrow{\bullet:\text{LD}} (sc, \rho[a \mapsto \mu \text{ stk} l], \mu)$$

$$P \text{ pc} = stk[l] := a_{>sc}$$

$$(pc, \rho, \mu) \xrightarrow{\bullet:\text{ST}} (sc, \rho, \mu[(\text{stk}, l) \mapsto \rho a])$$

# Missing rules for $P := [P]^{ra}$

POISON-NOP
$$\frac{P s = \mathsf{nop}_{\mathsf{SC}} \quad s \xrightarrow{\bullet : \bullet} u \quad t \xrightarrow{\bullet : \bullet} v}{(s, t, \mathsf{ps}) \xrightarrow{\bullet : \bullet : : \bullet : \bullet} (u, v, \mathsf{ps})} \qquad \frac{P s = \mathsf{br} \, b_{\mathsf{SCt}, \mathsf{SCf}} \quad \mathsf{ps} \, b = \mathsf{h} \quad s \xrightarrow{\mathsf{BR} : \mathsf{BR} \, \mathsf{b}} u \quad t \xrightarrow{\mathsf{BR} : \mathsf{BR} \, \mathsf{b}} v}{(s, t, \mathsf{ps}) \xrightarrow{\mathsf{BR} : \mathsf{BR} \, \mathsf{b} : : \mathsf{BR} : \mathsf{BR} \, \mathsf{b}} (u, v, \mathsf{ps})}$$

$$\frac{P s = \mathsf{a} := b \oplus c_{\mathsf{SC}} \quad s \xrightarrow{\bullet : \bullet} u \quad t \xrightarrow{\bullet : \bullet} v \quad \mathsf{pv} = (\mathsf{ps} \, b = \mathsf{ps} \, c = \mathsf{h}) ? \, \mathsf{h} : \mathsf{p}}{(s, t, \mathsf{ps}) \xrightarrow{\bullet : \bullet : : \bullet : \bullet} (u, v, \mathsf{ps} [a \mapsto \mathsf{pv}])}$$

Rule Poison-Asgn propagates poison values for assignments. Notice that weakly poisoned values become poisoned because the operator's result value is most likely a non-zero value. Thus, only if both arguments are healthy, we propagate h to the assigned register and p otherwise. Healthybranch forbids weakly poisoned branching conditions even though they are safe to leak. This is to avoid that P and  $[P]^{\rm ra}$  arrive at different program points (up to  $\Phi$ ). Branching with weakly poisoned registers could be supported by letting one state speculate while the other does regular branching to keep them at the same program point. However, this would violate the condition |S| = |T| and leads to a less intuitive product definition.

#### Lemma 4

PROOF. We do a case distinction on the transition rule for (S, T, P)  $Y:K::\delta:\lambda$  (U, V, Q). To that end, let P = P'.ps and Q = Q'.ps', U = U'.u and V = V'.v, and S = S'.s and T = T'.t. Let further be i = Ps and  $i' = [P]^{ra}t$ . For all cases except POISON-ROLLBACK and HEALTHY-SPEC it suffices to show that  $u \Psi^{ps'}v$ . All those rules only modify the top states and poison values, so Q' = Q'. The definition of  $\Psi^Q$  is then satisfied from  $S' \Psi^{Q'} T'$ .

We first do the separate two cases who change the size of the speculating states.

- ▶ HEALTHY-SPEC Then, i = br b, ps b = h, and Q = P'.ps.ps. Further, u and s as well as t and v are fully equal except for the program counter. Because  $s \Psi^{ps} t$  and  $\Psi^{ps}$  is indifferent to the program counter,  $u \Psi^{ps} v$ .
- ▶ POISON-ROLLBACK Then,  $U = S' \Psi^{P'} T' = V$ .

#### Now to the other cases.

▶ POISON-FILL, POISON-SPILL, POISON-MOVE, POISON-SHUFFLE-SFENCE, POISON-SFENCE, POISON-SLH, POISON-NOP, and HEALTHY-BRANCH ps' = ps. Further, u and s as well as t and v are fully equal except for the program counter. Because s  $\Psi^{ps}$  t and  $\Psi^{ps}$  is indifferent to the program counter, u  $\Psi^{ps}$  v.

- ▶ POISON-LOAD-STKUNSAFE, POISON-LOAD-SAFE, and POISON-LOAD-UNSAFE Then,  $i = a := \mathbf{a}[b]$  and ps' = ps[ $a \mapsto ps$ ]. Further, u and s as well as t and v are equal except for the program counter and a as well as  $a' = \Psi v a$ . But a is poisoned in ps', so  $\Psi^{ps'}$  is indifferent to its value,  $u \Psi^{ps'} v$ .
- ► HEALTHY-LOAD-SAFE Then, i = a := a[b] and  $n = \rho_s b$  and  $ps' = ps[a \mapsto ps a n] \le ps[a \mapsto ps]$ . Further, u and s as well as t and v are equal except for the program counter and a as well as  $a' = \Psi v a$ . The value of a is  $\mu_s a n$  and of a' is  $\mu_t a n$ . But a has the poison value of ps for (a, n) in ps', so  $u \Psi^{ps'} v$  follows from  $s \Psi^{ps} t$ .
- ► HEALTHY-LOAD-UNSAFE Analogue to the previous case, but (a, n) swapped to (b, m).
- ▶ POISON-STORE-STKUNSAFE Then,  $i = \mathbf{a}[b] := c$  and  $\gamma = \sup \mathbf{a}[b]$  and  $\delta = \sup \mathbf{s}[b]$  m and  $\Psi v d = (\mathbf{s}\mathbf{t}\mathbf{k}, \mathbf{m})$ . We get  $\mathbf{p}s' = \mathbf{p}s[d, (\mathbf{s}\mathbf{t}\mathbf{k}, \mathbf{m}) \mapsto \mathbf{p}]$ . Further, values of u and s are equal except for d and values of t and v are equal except  $\Psi v d = (\mathbf{s}\mathbf{t}\mathbf{k}, \mathbf{m})$ . Again, those are poisoned in  $\mathbf{p}s'$ , so  $u \Psi^{\mathbf{p}s'} v$  follows from  $s \Psi^{\mathbf{p}s} t$ .
- ▶ POISON-STORE-SAFE Then,  $i = \mathbf{a}[b] := c$  and  $\rho_s b = k$  and  $\rho_t (\Psi t b = 0)$  due to weak poisonedness. We get  $\mathsf{ps'} = \mathsf{ps}[(\mathbf{a}, k), (\mathbf{a}, 0) \mapsto \mathsf{p}]$ . Further, values of u and s are equal except for  $(\mathbf{a}, k)$  and values of t and v are equal except  $(\mathbf{a}, 0)$ . Again, those are poisoned in  $\mathsf{ps'}$ , so u  $\Psi^{\mathsf{ps'}}$  v follows from s  $\Psi^{\mathsf{ps}}$  t.
- ▶ POISON-STORE-UNSAFE Analogue to the previous case, but k swapped to l.
- ► HEALTHY-STORE-SAFE Then,  $i = \mathbf{a}[b] := c$  and  $\rho_s b = \rho_t (\Psi t b) = \mathbf{n} \in |\mathbf{a}|$  due to healthiness. Let further  $\Psi t c = c'$ . We get  $ps' = ps[(\mathbf{a}, \mathbf{n}) \mapsto ps c]$ . Further, values of u and s are equal except for  $(\mathbf{a}, \mathbf{k})$  and values of t and t are equal except  $(\mathbf{a}, \mathbf{0})$ . The value of  $(\mathbf{a}, \mathbf{n})$  in t is t and t is t and t is t and t in t is t and t are equal except  $(\mathbf{a}, \mathbf{0})$ . The value of  $(\mathbf{a}, \mathbf{n})$  in t is t and t is t and t is t and t are equal except  $(\mathbf{a}, \mathbf{0})$ . The value of  $(\mathbf{a}, \mathbf{n})$  has the poison value of t and t are equal except  $(\mathbf{a}, \mathbf{0})$ . The value of  $(\mathbf{a}, \mathbf{n})$  has the poison value of t and t are t and t are equal except t and t are equal exce
- ► HEALTHY-STORE-UNSAFE Analogue, but (**a**, n) swapped for (**b**, m).
- ▶ POISON-SHUFFLE-SLH Then,  $i' = \sinh a'$ , where  $\Psi s a = \Psi t a = a'$ . We get  $ps' = ps[a \mapsto wp]$ . t is equal to v and if S is speculating, then ps' is set to wp and  $\rho_v a' = 0$ . Otherwise, t is equal to v in values. In both cases,  $u \Psi^{ps'} v$  follows from  $s \Psi^{ps} t$ .

## Remaining transfer functions for $P := [P]^{ra}$

$$\begin{split} F_{\mathsf{nop},\mathsf{sc}} & \mathsf{ps} = G_{a' := b',\mathsf{sc}} \, \mathsf{ps} = G_{a' := \mathsf{stk}[1],\mathsf{sc}} \, \mathsf{ps} = G_{\mathsf{stk}[1] := b',\mathsf{sc}} \, \mathsf{ps} = \mathsf{ps} \\ F_{\mathsf{sfence},\mathsf{sc}} & \mathsf{ps} = \mathsf{h} \\ F_{\mathsf{br} \, b,\mathsf{sc}_\mathsf{t},\mathsf{sc}_\mathsf{f}} & \mathsf{ps} = \begin{cases} \mathsf{ps} & \mathsf{pr} \, b = \mathsf{h} \\ \mathsf{p} & \mathsf{pr} \, b \geq \mathsf{wp} \end{cases} \end{split}$$

#### Lemma 6

We first do the separate two cases who change the size of the speculating states.

- ▶ HEALTHY-SPEC Then, i = br b, ps b = h, and Q = P.ps. We have ps =  $F_i$  ps  $\leq X(s, t)$ . And because U = S.u and V = T.v, P  $\leq$  P, we have P.ps  $\leq$  P. $X(s, t) = X_{U,V}$ .
- ▶ POISON-ROLLBACK Then, U = S' and V = T', and since  $X_{S,T} = X_{S',T'}$ .ps,  $Q' = X_{U,V}$ .

Now to the other cases.

- $\blacktriangleright$  Poison-fill, Poison-spill, Poison-move, and Poison-shuffle-spence ps' = ps =  $G_{i'}$  ps.
- ▶ POISON-SFENCE, POISON-SLH, POISON-NOP, and HEALTHY-BRANCH  $ps' = ps = F_i ps$ .
- ▶ POISON-LOAD-STKUNSAFE, POISON-LOAD-SAFE, and POISON-LOAD-UNSAFE Then,  $i = a := \mathbf{a}[b]$  and  $ps' = ps[a \mapsto ps] = F_i$  ps.
- ► HEALTHY-LOAD-SAFE Then,  $i = a := \mathbf{a}[b]$  and  $n = \rho_s b$  and  $ps' = ps[a \mapsto ps \mathbf{a} n] \le ps[a \mapsto ps] = F_i ps$ .
- ► HEALTHY-LOAD-UNSAFE Then,  $i = a := \mathbf{a}[b]$  and  $\gamma = \text{Lu}\,\mathbf{b}\,\mathbf{m}$  and  $ps' = ps[a \mapsto ps\,\mathbf{b}\,\mathbf{m}] \le ps[a \mapsto ps] = F_i$  ps.
- ▶ POISON-STORE-STKUNSAFE Then,  $i = \mathbf{a}[b] \coloneqq c$  and  $\gamma = \sup \mathbf{a}[a]$  and  $\delta = \sup \mathbf{s}[b]$  m and  $\Psi t d = (\mathbf{s}t\mathbf{k}, \mathbf{m})$ . We get  $ps' = ps[d, (\mathbf{s}t\mathbf{k}, \mathbf{m}) \mapsto p] \le ps[Mem \mapsto psc][Reg, \mathbf{a} \mapsto p] = F_i$  ps.
- ▶ POISON-STORE-SAFE Then,  $i = \mathbf{a}[b] \coloneqq c$  and  $\rho_s b = k$  and  $\rho_t (\Psi t b = 0$  due to weak poisonedness. We get  $\mathsf{ps}' = \mathsf{ps}[(\mathbf{a}, \mathsf{k}), (\mathbf{a}, 0) \mapsto \mathsf{p}] \le \mathsf{ps}[Mem \mapsto \mathsf{ps} \, c][Reg, \mathbf{a} \mapsto \mathsf{p}] = F_i \, \mathsf{ps}.$
- ▶ POISON-STORE-UNSAFE Then,  $i = \mathbf{a}[b] := c$  and  $\gamma = \sup \mathbf{a}[b]$  and  $\rho_t (\Psi t b) = 0$  due to weak poisonedness. We get  $\mathsf{ps'} = \mathsf{ps}[(\mathbf{a}, \mathsf{l}), (\mathbf{a}, 0) \mapsto \mathsf{p}] \leq \mathsf{ps}[Mem \mapsto \mathsf{ps} c][Reg, \mathbf{a} \mapsto \mathsf{p}] = F_i \mathsf{ps}.$
- ▶ HEALTHY-STORE-SAFE Then,  $i = \mathbf{a}[b] := c$  and  $\rho_s b = \rho_t (\Psi t b = n \in |\mathbf{a}|)$  due to healthiness. We get  $ps' = ps[(\mathbf{a}, n) \mapsto ps c] \leq ps[Mem \mapsto ps c][Reg, \mathbf{a} \mapsto p] = F_i ps$ .
- ▶ HEALTHY-STORE-UNSAFE Then,  $i = \mathbf{a}[b] := c$  and  $\delta = \gamma = \sup \mathbf{b} \text{ m}$ . We get  $\operatorname{ps}' = \operatorname{ps}[(\mathbf{b}, \mathbf{m}) \mapsto \operatorname{ps} c] \leq \operatorname{ps}[Mem \mapsto \operatorname{ps} c][Reg, \mathbf{a} \mapsto \operatorname{p}] = F_i \operatorname{ps}$ .
- ▶ POISON-SHUFFLE-SLH Then,  $i' = \sinh a'$ , where  $\Psi s a = \Psi t a = a'$ . We get  $ps' = ps[a \mapsto wp] = G_{i'}$  ps.

## Full proof for Lemma 7

Like for dead code elimination in Section 5, we need to prove that  $(\prec, \lhd)$  (i) is a simulation (Definition 4), (ii) respects sec (Definition 2), and (iii) is snippy (Definition 6).

The intermediary lemmas are proved subsequently in their own sections.

PROOF THAT  $(\prec, \lhd)$  is sec-respecting. We required  $\sec \mathbf{stk} = \mathsf{L}$  so when  $t_1 =_{\sec} t_2$ , and  $s_1 \Psi^\mathsf{h} t_1$  as well as  $s_2 \Psi^\mathsf{h} t_2$ , then  $s_1$  and  $t_1$  equal on all values in memory other than  $\mathbf{stk}$  (remember that ps is not defined on  $\mathbf{stk}$ ). Similarly,  $s_2$  and  $t_2$  equal on memory. So,  $s_1 =_{\sec} s_2$  if and only if  $t_1 =_{\sec} t_2$ .  $\square$ 

PROOF THAT  $(\prec, \lhd)$  IS A SIMULATION. In order to prove that  $\prec$  is a simulation (Definition 4), we first need to show that for all initial t of  $[P]^{\mathsf{ra}}$  there is an initial s for P with  $t \prec s$ . Secondly, for all pairs of states  $T \prec S$ , we need to derive  $(\boxed{\prec}, \lnot_{T \prec S} \vdash T \prec_t S : \varepsilon)$  in Rules 4.1. For the initial states, let  $t = (\mathsf{init}', \rho', \mu')$ . We construct  $s = (\mathsf{init}, \rho, \mu)$ . First, we know from instruction matching, that  $\Phi$  init  $= \mathsf{init}'$ , thus  $s \Phi$  t. We can then choose  $\mu = \mu'$  and  $\rho$   $a = [\Psi \mathsf{init}' a]_t$ .

For the second part, consider  $T \prec S$ . We need to provide a proof for  $\langle \boxed{\prec}, \lnot_{T \prec S} \vdash T \prec_t S : \varepsilon \rangle$ . The case where T is final is trivial, so consider non-final T and S. We need two small helping-lemmas that state that  $\Phi$ -injected instructions and shuffle instructions of  $[P]^{\text{ra}}$  are preserved to the product. The proof of these we skip, but they rely on the fact that  $(\Phi, \Psi)$  is poison-typable with X.

LEMMA 8. Let  $T \prec S$ ,  $T \xrightarrow{\delta:\lambda} X$ , and P.ps =  $X_{S,T}$ . Then,  $(S, T, ps) \xrightarrow{Y:K::\delta:\lambda} (U, X, ps')$  with  $U \Psi_X^{ps'} X$ .

Lemma 9. Let  $U \Psi_X^{ps} X$  and  $X \stackrel{d:l}{\longleftrightarrow} V$  be shuffle only in  $[P]^{ra}$ . Then,  $P :: [P]^{ra}$  has a unique execution  $(U, X, ps) \stackrel{\varepsilon: \varepsilon :: d:l}{\longleftrightarrow} (U, V, ps')$  with  $U \Psi_X^{ps'} V$ .

 where V is again Φ-injected but no intermediary state is. The explored executions have the shape of the teal and red paths of  $[P]^{\text{ra}}$  in Figure 4. We thus need to prove  $\{ \begin{bmatrix} \overline{-} \\ \overline{-} \end{bmatrix} \land T_{\prec S} \vdash V \prec_t S : \delta.d \}$ . The next step is to perform directive transformation with  $\land_{T \prec S}$ . But we defined the transformation on Φ-intervals. So we find the appropriate transitions for  $T \xrightarrow{\delta.\lambda} X \xrightarrow{d:l} V$  in  $P :: [P]^{\text{ra}}$ . For the first transition, Lemma 8 yields  $(S, T, ps) \xrightarrow{Y:K::\delta:\lambda} (U, X, ps')$  so that  $U \Psi_X^{ps'} X$ . For the remaining transitions, there are two cases:

- ► Otherwise, let RB occur in d. Then, the explored execution from X is X cdot d': l' cdot \* Y . RB:RB V with d = d'.RB. Indeed, d must end with RB, because V is already Φ-injected: When X = X'x and U = U'.u, then V = X'. Further, X' and U' are already contained in T and S, respectively. Because  $S \Phi T$  we also have  $U' \Phi X' = V$ . So V is Φ-injected and exploration stopped upon discovering it. Again, by Lemma 9 there is  $(U, X, ps') \xrightarrow{\varepsilon \in \varepsilon : d:l} (U, Y, ps'')$  with  $U \Psi_X^{ps''} Y$ . By definition of  $P :: [P]^{ra}$ ,  $(U, Y, ps') \xrightarrow{RB:RB:RB:RB:RB:RB:RB} (U', X', X_{U',X'})$ . This is one of the teal Φ-intervals, so  $\gamma$ .RB  $\triangleleft_{T \prec S} \delta.d'$ .RB. The remaining arguments for SRC and COIND are similar to the previous case.

COROLLARY 1. For  $T \prec S$ ,  $(S, T, X_{S,T}) \xrightarrow{e:k:d:l*} (U, V, Q)$  is a  $\Phi$ -interval iff  $(S, T) \xrightarrow{e:k \triangleleft d:l} (U, V)$ .

PROOF THAT  $(\prec, \lhd)$  Is snippy. In order to prove the simulation snippy, we are given the black parts of Figure 2. So, consider  $T_1 \prec S_1$ ,  $T_2 \prec S_2$ ,  $T_1 \equiv T_2$ , and  $S_1 \equiv S_2$ . Further, consider a simulation interval  $(S_1, T_1) \xrightarrow{e:k \lhd d:l} (U_1, V_1)$  and  $S_2 \xrightarrow{e:k} U_2$ . We need to prove  $(S_2, T_2) \xrightarrow{e:k \lhd d:l} (U_2, V_2)$ . As simulation intervals are the same as  $\Phi$ -intervals, we have the  $\Phi$ -interval  $(S_1, T_1, X_{S_1, T_1}) \xrightarrow{e:k \lhd d:l} (U_1, V_1, ps')$ , and want to prove that  $(S_2, T_2, X_{S_2, T_2}) \xrightarrow{e:k \lhd d:l} (U_2, V_2, ps')$ . We again split the interval into the instruction-matched transition and a shuffle sequence:  $(S_1, T_1, X_{S_1, S_1}) \xrightarrow{Y:K::\delta:\lambda} (W_1, X_1, ps'') \xrightarrow{e':k'::d':l'} (U_1, V_1, ps')$ . The difficult part is to reproduce the instruction-matched transition from  $(S_2, T_2, X_{S_2, T_2})$ . Because  $X_{S_1, T_1}$  is only dependent on program counters,  $X_{S_1, T_1} = X_{S_2, T_2}$ .

LEMMA 10. There is a transition  $(S_2, T_2, X_{S_2,T_2}) \xrightarrow{Y:K::\delta:\lambda} (U_2, V_2, ps')$  in  $P::[P]^{ra}$ .

We reproduce the second part of the Φ-interval,  $(W_1, X_1, ps''_1) \xrightarrow{e':k'::d':l'} (U_1, V_1, ps'_1)$ . Once more, we perform case distinction on whether d' and e' follow the red or teal paths in Figure 4.

▶ No RB occurs in d'. In this case, d' is shuffle-only and  $e' = \varepsilon$ . Because shuffle semantics are deterministic and all have the same directive •, the following lemma is straightforward to show. Together with Lemma 9, the shuffle sequence is reproduced from  $(W_2, X_2, ps')$ .

LEMMA 11. If  $V_1 \equiv V_2$  and  $PV_1 = sh_{sc'}$  with shuffle-only  $V_1 \xrightarrow{d:l} X_1$ , then  $V_2 \xrightarrow{d:l} X_2$ .

Indeed, we complete reproduction in this case: With  $(S_2, T_2, ps)$   $\xrightarrow{Y:K:::\delta:\lambda}$   $(W_2, X_2, ps'')$  from Lemma 10 and  $(W_2, X_2, ps''_2)$   $\xrightarrow{\varepsilon:\varepsilon::d':l'}*$   $(U_2, V_2, ps')$  from Lemmas 9 and 11, we constructed  $(S_2, T_2, X_{S_2, T_2})$   $\xrightarrow{e:k::d:l}$   $(U_2, V_2, ps')$ .

► Otherwise, let RB occur in d. Then,  $(W_1, X_1, ps'') \xrightarrow{e':k' :: d':l'} (U_1, V_1, ps')$  is actually the teal path  $(W_1, X_1, ps'') \xrightarrow{\epsilon:\epsilon: d'':l''} (W_1', X_1', ps''') \xrightarrow{RB:RB:RB:RB:RB:RB:RB} (U_1, V_1, ps')$ , where d'' is a shuffle

#### Lemma 8

PROOF. By construction of the product. Case distinction on  $T_1 \xrightarrow{\delta:\lambda} X_1$ . Throughout the proof, we assume  $\Psi T a = a'$ ,  $\Psi T b = b'$ , and  $\Psi T c = c'$ . Further, we invariantly let T = T'.t, S = S'.s, X = X'.x, U = U'.u, Also P T = i' and P S = i.  $S \Psi_X^{ps'} X$  follows from Lemma 4.

- ▶ NOP and ASGN We skip NOP. For ASGN, let  $i' = a' := b' \oplus c'_{,sc'}$ . Then  $i = a := b \oplus c_{,sc}$ . Clearly there is U with  $S \xrightarrow{\bullet : \bullet} U$ . Rule Poison-ASGN yields  $(S, T, ps) \xrightarrow{\bullet : \bullet : \bullet : \bullet} (U, X, ps')$ .  $U \Psi_X^{ps'} X$  follows from construction of Rule Poison-ASGN.
- ▶ LOAD For LOAD, let  $i' = a' := \mathbf{a}[b']_{>sc'}$  and  $\lambda = \text{LD n}$ . We have ps  $b \ge \text{wp}$ , thus  $\rho_T b' = 0$  or  $\rho_T b' = \rho_S b$ .
  - $ho_S b = m \in |\mathbf{a}|$  Clearly there is U with  $S \xrightarrow{\bullet : LD m} U$ .

    POISON-LOAD-SAFE yields  $(S, T, ps) \xrightarrow{\bullet : LD m :: \bullet : LD n} (U, X, ps')$ .
  - ▷  $\rho_S b = m \notin |\mathbf{a}|$  Clearly there is U with  $S \xrightarrow{\text{LU} \mathbf{a} 0 : \text{LD} m} U$ .

    POISON-LOAD-UNSAFE yields  $(S, T, ps) \xrightarrow{\text{LU} \mathbf{a} 0 : \text{LD} m :: \bullet : \text{LD} n} (U, X, ps')$ .
- ▶ STORE For STORE, let  $i' = \mathbf{a}[b'] \coloneqq c'_{,sc'}$  and  $\lambda = st$  n. We have ps  $b \ge wp$ , thus  $\rho_T b' = 0$  or  $\rho_T b' = \rho_S b$ .
  - ▷  $\rho_S b = m \in |\mathbf{a}|$  Clearly there is U with  $S \xrightarrow{\bullet : \text{ST } m} U$ .

    POISON-STORE-SAFE yields  $(S, T, \text{ps}) \xrightarrow{\bullet : \text{ST } m :: \bullet : \text{ST } n} (U, X, \text{ps}')$ .
- ▶ LOAD-UNSAFE For LOAD-UNSAFE, let  $i' = a' := \mathbf{a}[b']_{,sc'}$  and  $\lambda = \text{LD n}$  and  $\delta = \text{LU b}$  m. We have ps  $b \geq \text{wp}$ , thus  $\rho_T b' = 0$  or  $\rho_T b' = \rho_S b$ . Due to  $0 \in |\mathbf{a}|$ , only  $\rho_T b' = \rho_S b$  needs to be considered. We consider two cases:
  - ▷ **b** ≠ **stk** There is U with  $S \xrightarrow{\text{LU} \mathbf{b} \text{m}: \text{LD} \text{m}} U$ .

    HEALTHY-LOAD-UNSAFE yields  $(S, T, ps) \xrightarrow{\bullet: \text{LD} \text{m}:: \bullet: \text{LD} \text{n}} (U, X, ps')$ .
  - ▶ **b** = **stk** There is U with  $S \xrightarrow{\text{LUa0:LDm}} U$ .

    POISON-LOAD-STKUNSAFE yields  $(S, T, ps) \xrightarrow{\text{LUa0:LDm::•:LDn}} (U, X, ps')$ .
- ▶ STORE-UNSAFE For STORE-UNSAFE, let  $i' = \mathbf{a}[b'] := c'_{\rightarrow sc'}$  and  $\lambda = st$  n and  $\delta = su$  **b** m. We have ps  $b \ge wp$ , thus  $\rho_T b' = 0$  or  $\rho_T b' = \rho_S b$ . Due to  $0 \in |\mathbf{a}|$ , only  $\rho_T b' = \rho_S b$  needs to be considered. We consider two cases:
  - ▷ **b** ≠ **stk** There is U with  $S \xrightarrow{\text{subm:stm}} U$ .

    HEALTHY-STORE-UNSAFE yields  $(S, T, ps) \xrightarrow{\bullet : \text{stm} :: \bullet : \text{stn}} (U, X, ps')$ .
  - ▶ **b** = **stk** There is U with  $S \xrightarrow{\text{SUa0:STm}} U$ .

    POISON-STORE-STKUNSAFE yields  $(S, T, ps) \xrightarrow{\text{SUa0:STm}::\bullet:STn} (U, X, ps')$ .
- ▶ BRANCH For BRANCH, let  $i' = \text{br } b'_{>sc'_t,sc'_t}$  and  $\lambda = \text{br b }$  and  $\delta = \text{br.}$  We have ps b = h, thus  $\rho_T b' = \rho_S b$ . Then there is U with  $S \xrightarrow{\text{BR:BR} b} U$ . HEALTHY-BRANCH yields  $(S, T, ps) \xrightarrow{\bullet : \text{BR} b :: \bullet : \text{BR} b} (U, X, ps')$ .
- ► SPEC For SPEC, let  $i' = \text{br } b'_{,\text{sc}'_{1},\text{sc}'_{1}}$  and  $\lambda = \text{BR b}$  and  $\delta = \text{sp}$ . We have ps b = h, thus  $\rho_{T} b' = \rho_{S} b$ . Then there is U with  $S \xrightarrow{\text{SP:BR}b} U$ . HEALTHY-SPEC yields  $(S, T, \text{ps}) \xrightarrow{\text{SP:BR}b :: SP:BR} (U, X, \text{ps}')$ .
- ▶ SFENCE For SFENCE, let  $i' = \text{sfence}_{,\text{sc}'}$  and  $\lambda = \bullet$  and  $\delta = \bullet$ . Then |T| = 1 = |S|. Thus there is U with  $S \xrightarrow{\bullet : \bullet} U$ . Poison-sfence yields  $(S, T, ps) \xrightarrow{\bullet : \bullet : : \bullet : \bullet} (U, X, ps')$ .
- ▶ SLH For SLH, let  $i' = \sinh a'_{,sc'}$  and  $\lambda = \bullet$  and  $\delta = \bullet$ . Thus there is U with  $S \xrightarrow{\bullet : \bullet} U$ . POISON-SLH yields  $(S, T, ps) \xrightarrow{\bullet : \bullet : \bullet : \bullet} (U, X, ps')$ .

## Lemma 9

PROOF. Prove this for a single step, the rest is induction. Let further be U = U'.u and X = X'.x. Let further be ps = (pr, pm). Let  $u = (pc, \rho, \mu)$ ,  $x = (pc', \rho', \mu')$ ,  $S \Psi_X^{ps} X$  and  $[P]^{ra} pc' = sir'$ . We do case distinction on si.

- ▶  $a' := b'_{,r'}$  Then,  $x = (pc', \rho', \mu') \xrightarrow{\bullet \cdot \bullet} (r', \rho'', \mu') = v$  with  $\rho'[a' \mapsto \rho' b'] = \rho''$ . By shuffle conformity, we know that there is b with  $\Psi$  pc' b = b'. Further, we know that  $\Psi$  r' coincides with  $\Psi$  pc' except for b. Also, ps = ps'. By induction,  $S \Psi_{\chi}^{ps} X$ , we have  $[\![\Psi pc' b]\!]_v = \rho'' a' = \rho' b' = [\![\Psi pc' b]\!]_x = \rho b$  as required for  $S \Psi_{\chi}^{ps} V$ . Poison-move yields the transition.
- ▶  $a' := \mathbf{stk}[1]_{,r'}$  Then,  $x = (pc', \rho', \mu') \xrightarrow{\bullet : \mathrm{LDI}} (r', \rho'', \mu') = v$  with  $\rho'[a' \mapsto \mu \, \mathbf{stk}\, l] = \rho''$ . By shuffle conformity, we know that there is b with  $\Psi \, pc' \, b = l$ . Further, we know that  $\Psi \, r' \, coincides$  with  $\Psi \, pc' \, except$  for b, where  $\Psi \, r' \, b = a'$ . Also, ps = ps'. By induction,  $S \, \Psi_{\chi}^{ps} \, X$ , we have  $\llbracket \Psi \, pc' \, b \rrbracket_v = \rho'' \, a' = \mu' \, \mathbf{stk} \, l = \llbracket \Psi \, pc' \, b \rrbracket_x = \rho \, b$  as required for  $S \, \Psi_{\chi}^{ps} \, V$ . Poison-fill yields the transition.
- ▶  $\mathsf{stk}[\mathsf{I}] := b'_{,\mathsf{r}'}$  Then,  $x = (\mathsf{pc'}, \rho', \mu') \xrightarrow{\bullet : \mathsf{ST}^{\mathsf{I}}} (\mathsf{r'}, \rho', \mu'') = v$  with  $\mu'[(\mathsf{stk}, \mathsf{I}) \mapsto \rho' \, b'] = \mu''$ . By shuffle conformity, we know that there is b with  $\Psi \, \mathsf{pc'} \, b = b'$ . Further, we know that  $\Psi \, \mathsf{r'}$  coincides with  $\Psi \, \mathsf{pc'} \, except$  for b, where  $\Psi \, \mathsf{r'} \, b = \mathsf{I}$ . Also,  $\mathsf{ps} = \mathsf{ps'}$ . By induction,  $S \, \Psi_{\chi}^{\mathsf{ps}} \, X$ , we have  $[\![\Psi \, \mathsf{pc'} \, b]\!]_v = \mu'' \, \mathsf{stk} \, \mathsf{I} = \rho' \, b' = [\![\Psi \, \mathsf{pc'} \, b]\!]_x = \rho \, b$  as required for  $S \, \Psi_{\chi}^{\mathsf{ps}} \, V$ . Poison-spill yields the transition.
- ► sfence, Then |X'| = 0 and  $x = (pc', \rho', \mu') \xrightarrow{\bullet:\bullet} (r', \rho', \mu') = v$ . Also, ps = ps'.  $s \Psi_X^{ps} x$  implies  $s \Psi_X^{ps} v = V$ . Poison-shuffle-sfence yields the transition.
- ▶  $\sinh a'_{,r'}$  If |X'| = 0, this case is fully analogue to the previous one. Thus, let  $|X'| \ge 1$ . Then,  $x = (\text{pc'}, \rho', \mu') \xrightarrow{\bullet : \bullet} (r', \rho'', \mu') = v$  with  $\rho'[a' \mapsto 0] = \rho''$ . By shuffle conformity, we know that  $\Psi \text{pc'} = \Psi \text{r'}$ . Also  $\text{ps'} = (\text{pr}[a \mapsto \text{wp}], \text{pm})$ . Indeed,  $S \Psi_\chi^{\text{ps}} X$  and  $[\![\Psi \text{pc'} a]\!]_v = 0$  yields  $S \Psi_\chi^{\text{ps}} V$ . Poison-shuffle-slh yields the transition.

# Lemma 10

PROOF. By case distinction on  $(S_1, T_1, ps) \xrightarrow{Y:K:S:\lambda} (U_1, V_1, ps')$ . For all cases, and  $i \in \{1, 2\}$ : Let  $T_i = T_i'.t_i$ ,  $S_i = S_i'.s_i$ ,  $V_i = V_i'.v_i$ , and  $U_i = U_i'.u_i$ , and  $t_1 \equiv pc' \equiv t_2$  as well as  $s_1 \equiv pc \equiv s_2$ . Further, let  $[P]^{ra}$  pc' = i' and P pc = i, and assume  $\Psi$  pc' a = a',  $\Psi$  pc' b = b', and  $\Psi$  pc' c = c' if they occur in i and i'. Finally, let  $X_{S_1,T_1} = X_{S_2,T_2} = P$ .ps. Please note, that ps' is not dependent on values: Whenever the same rule in  $P :: [P]^{ra}$  is executed, then the same ps' is obtained. The arguments for equality of ps' are thus skipped.

- ► HEALTHY-STORE-SAFE In this case,  $i = \mathbf{a}[b] := c_{,\text{sc}}$  and  $i' = \mathbf{a}[b'] := c'_{,\text{r'}}$ . The presumptions for transition  $(S_1, T_1, \text{ps}) \xrightarrow{Y:K::S:\lambda} (U_1, V_1, \text{ps'})$  yield  $t_1 \xrightarrow{\bullet:ST \, \mathsf{n}} v_1$ , and  $s_1 \xrightarrow{\bullet:ST \, \mathsf{n}} u_1$ , and ps  $b = \mathsf{h}$ . Our assumption is that  $s_2 \xrightarrow{\bullet:ST \, \mathsf{n}} u_2$  can be executed. The leaked address is  $\mathsf{n} = \rho_{s_2} \, b \in |\mathbf{a}|$ . Due to ps  $b = \mathsf{h}$  and  $S_2 \, \Psi_\chi^{\mathsf{ps}} \, T_2$ ,  $\rho_{t_2} \, b' = \rho_{s_2} \, b = \mathsf{n}$ . This suffices for  $T_2 \xrightarrow{\bullet:ST \, \mathsf{n}} V_2$ , and in turn for  $(S_2, T_2, \mathsf{ps}) \xrightarrow{\bullet:ST \, \mathsf{n} ::\bullet:ST \, \mathsf{n}} (U_2, V_2, \mathsf{ps'})$ .
- ▶ POISON-LOAD-STKUNSAFE Then,  $i = a := \mathbf{a}[b]_{>sc}$  and  $i = a := \mathbf{a}[b]_{>sc}$ . The presumptions yield  $t_1 \xrightarrow{\text{LU}\mathbf{b}|\text{LLD}\mathbf{n}} v_1$ , and  $s_1 \xrightarrow{\text{LU}\mathbf{b}|\text{LLD}\mathbf{n}} u_1$ , and ps b = h. Our assumption is  $s_2 \xrightarrow{\text{LU}\mathbf{b}|\text{LLD}\mathbf{n}} u_2$ . Due to ps b = h,  $\rho_{t_1}b' = \rho_{s_1}b = n = \rho_{s_2}b = \rho_{t_2}b' \notin |\mathbf{a}|$ . That justifies  $T_2 \xrightarrow{\text{LU}\mathbf{s}\mathbf{t}\mathbf{k}\mathbf{m}:\text{LD}\mathbf{n}} V_2$ . That suffices for  $(S_2, T_2, ps) \xrightarrow{\text{LU}\mathbf{b}|\text{LLD}\mathbf{n}:\text{LU}\mathbf{s}\mathbf{t}\mathbf{k}\mathbf{m}:\text{LD}\mathbf{n}} (U_2, V_2, ps')$ .
- ▶ POISON-NOP and POISON-ASGN Trivial.
- ▶ POISON-STORE-STKUNSAFE Then,  $i = \mathbf{a}[b] := c_{>sc}$ ,  $t_1 \xrightarrow{su\,\mathbf{stk}\,\mathbf{m}:sn} v_1$ , and  $s_1 \xrightarrow{su\,\mathbf{bl}:sn} u_1$ , and ps  $b = \mathbf{h}$ . Our assumption is  $s_2 \xrightarrow{su\,\mathbf{bl}:sn} u_2$ . Due to ps  $b = \mathbf{h}$ ,  $\rho_{t_1}b' = \rho_{s_1}b = \mathbf{n} = \rho_{s_2}b = \rho_{t_2}b' \notin |\mathbf{a}|$ . That justifies  $T_2 \xrightarrow{\text{lu}\,\mathbf{stk}\,\mathbf{m}:lDn} V_2$ . That suffices for  $(S_2, T_2, ps) \xrightarrow{\text{su}\,\mathbf{bl}:sn\,n:lu\,\mathbf{stk}\,\mathbf{m}:lDn} (U_2, V_2, ps')$ .

- ▶ POISON-LOAD-SAFE Then,  $i = a := a[b]_{>sc}$ ,  $t_1 \xrightarrow{\bullet : LD \, 0} v_1$ , and  $s_1 \xrightarrow{\bullet : LD \, k} u_1$ , and ps b = wp. Further  $s_2 \xrightarrow{\bullet : LD \, k} u_2$ . Due to ps b = wp,  $\rho_{T_2} \, b' = 0 \in |a|$  justifies  $T_2 \xrightarrow{\bullet : LD \, 0} V_2$ . That suffices for  $(S_2, T_2, ps) \xrightarrow{\bullet : LD \, k :: \bullet : LD \, 0} (U_2, V_2, ps')$ .
- ▶ POISON-LOAD-UNSAFE Then,  $i = a := a[b]_{>sc}$ ,  $t_1 \xrightarrow{\bullet : \text{LD} \, 0} v_1$ , and  $s_1 \xrightarrow{\text{LU} \, \mathbf{b} \, | : \text{LD} \, \mathbf{k}} u_1$ , and ps b = wp. Further  $s_2 \xrightarrow{\text{LU} \, \mathbf{b} \, | : \text{LD} \, \mathbf{k}} u_2$ . Due to ps b = wp,  $\rho_{T_2} \, b' = 0 \in |\mathbf{a}|$  justifies  $T_2 \xrightarrow{\bullet : \text{LD} \, 0} V_2$ . That suffices for  $(S_2, T_2, \text{ps}) \xrightarrow{\text{LU} \, \mathbf{b} \, | : \text{LD} \, \mathbf{k}} : \bullet : \bullet : \bullet : \bullet} (U_2, V_2, \text{ps}')$ .
- ► HEALTHY-LOAD-SAFE Then,  $i = a := \mathbf{a}[b]_{>sc}$ ,  $t_1 \xrightarrow{\bullet : \mathrm{LD} \, \mathsf{n}} v_1$ , and  $s_1 \xrightarrow{\bullet : \mathrm{LD} \, \mathsf{n}} u_1$ , and ps  $b = \mathsf{h}$ . Further  $s_2 \xrightarrow{\bullet : \mathrm{LD} \, \mathsf{n}} u_2$ . Due to ps  $b = \mathsf{h}$ ,  $\rho_{T_1} \, b' = \rho_{S_1} \, b = \mathsf{n} = \rho_{S_2} \, b = \rho_{T_2} \, b' \in |\mathbf{a}|$  justifies  $T_2 \xrightarrow{\bullet : \mathrm{LD} \, \mathsf{n}} V_2$ . That suffices for  $(S_2, T_2, \mathsf{ps}) \xrightarrow{\bullet : \mathrm{LD} \, \mathsf{n} : \bullet : \mathrm{LD} \, \mathsf{n}} (U_2, V_2, \mathsf{ps'})$ .
- ► HEALTHY-LOAD-UNSAFE Then,  $i = a := \mathbf{a}[b]_{sc}$ ,  $t_1 \xrightarrow{\text{LU}\,\mathbf{b}\,\mathbf{m}:\text{LD}\,\mathbf{n}} v_1$ , and  $s_1 \xrightarrow{\text{LU}\,\mathbf{b}\,\mathbf{m}:\text{LD}\,\mathbf{n}} u_1$ , and ps  $b = \mathbf{h}$ . Further  $s_2 \xrightarrow{\text{LU}\,\mathbf{b}\,\mathbf{m}:\text{LD}\,\mathbf{n}} u_2$ . Due to ps  $b = \mathbf{h}$ ,  $\rho_{T_1}\,b' = \rho_{S_1}\,b = \mathbf{n} = \rho_{S_2}\,b = \rho_{T_2}\,b' \notin |\mathbf{a}|$  justifies  $T_2 \xrightarrow{\text{LU}\,\mathbf{b}\,\mathbf{m}:\text{LD}\,\mathbf{n}} V_2$ . That suffices for  $(S_2, T_2, \mathsf{ps}) \xrightarrow{\text{LU}\,\mathbf{b}\,\mathbf{m}:\text{LD}\,\mathbf{n}:\text{LU}\,\mathbf{b}\,\mathbf{m}:\text{LD}\,\mathbf{n}} (U_2, V_2, \mathsf{ps}')$ .
- ▶ POISON-STORE-SAFE Then,  $i = \mathbf{a}[b] := c_{>sc}$ ,  $t_1 \xrightarrow{\bullet:st\ 0} v_1$ , and  $s_1 \xrightarrow{\bullet:st\ k} u_1$ , and ps b = wp. Further  $s_2 \xrightarrow{\bullet:st\ k} u_2$ . Due to ps b = wp,  $\rho_{T_2}b' = 0 \in |\mathbf{a}|$  justifies  $T_2 \xrightarrow{\bullet:st\ 0} V_2$ . That suffices for  $(S_2, T_2, ps) \xrightarrow{\bullet:st\ k::\bullet:st\ 0} (U_2, V_2, ps')$ .
- ▶ POISON-STORE-UNSAFE Then,  $i = \mathbf{a}[b] := c_{,sc}, t_1 \xrightarrow{\bullet:st\ 0} v_1$ , and  $s_1 \xrightarrow{\text{su}\,\mathbf{b}\,\mathbf{l}:st\ k} u_1$ , and ps b = wp. Further  $s_2 \xrightarrow{\text{su}\,\mathbf{b}\,\mathbf{l}:st\ k} u_2$ . Due to ps  $b = \text{wp}, \rho_{T_2}\ b' = 0 \in |\mathbf{a}|$  justifies  $T_2 \xrightarrow{\bullet:st\ 0} V_2$ . That suffices for  $(S_2, T_2, \mathsf{ps}) \xrightarrow{\text{su}\,\mathbf{b}\,\mathbf{l}:st\ k::\bullet:st\ 0} (U_2, V_2, \mathsf{ps}')$ .
- ► HEALTHY-STORE-UNSAFE Then,  $i = \mathbf{a}[b] := c_{,sc}$ ,  $t_1 \xrightarrow{\text{subm:stn}} v_1$ , and  $s_1 \xrightarrow{\text{subm:stn}} u_1$ , and ps b = h. Further  $s_2 \xrightarrow{\text{subm:stn}} u_2$ . Due to ps b = h,  $\rho_{t_1} b' = \rho_{s_1} b = n = \rho_{s_2} b = \rho_{t_2} b' \notin |\mathbf{a}|$  justifies  $T_2 \xrightarrow{\text{subm:stn}} V_2$ . That suffices for  $(S_2, T_2, ps) \xrightarrow{\text{subm:stn}} (U_2, V_2, ps')$ .
- ► HEALTHY-BRANCH Then,  $i = \text{br } b_{,\text{sct},\text{scf}}$ ,  $t_1 \xrightarrow{\text{BR:BR}\,\text{b}} v_1$ , and  $s_1 \xrightarrow{\text{BR:BR}\,\text{b}} u_1$ , and ps b = h. Further  $s_2 \xrightarrow{\text{BR:BR}\,\text{b}} u_2$ . Due to ps b = h,  $(\rho_{t_1}\,b' = 0) = (\rho_{s_1}\,b = 0) = \text{b} = (\rho_{s_2}\,b = 0) = (\rho_{t_2}\,b' = 0)$  justifies  $T_2 \xrightarrow{\text{BR:BR}\,\text{b}} V_2$ . That suffices for  $(S_2, T_2, \text{ps}) \xrightarrow{\text{BR:BR}\,\text{b}::\text{BR:BR}\,\text{b}::\text{BR:BR}\,\text{b}} (U_2, V_2, \text{ps}')$ .
- ► HEALTHY-SPEC Then,  $i = br \ b_{>sc_t, sc_f}$ ,  $T_1 \xrightarrow{sp:BR} \xrightarrow{b} T_1 x_1$ , and  $S_1 \xrightarrow{sp:BR} \xrightarrow{b} S_1 x_1$ , and ps b = h. Further  $S_2 \xrightarrow{sp:BR} \xrightarrow{b} S_2 x_2$ . Due to ps b = h,  $(\rho_{T_1} \ b' = 0) = (\rho_{S_1} \ b = 0) = b = (\rho_{S_2} \ b = 0) = (\rho_{T_2} \ b' = 0)$  justifies  $T_2 \xrightarrow{sp:BR} \xrightarrow{b} T_2 x_2 = V_2$ . That suffices for  $(S_2, T_2, ps) \xrightarrow{sp:BR} \xrightarrow{b:sp:BR} \xrightarrow{b} (U_2, V_2, ps')$ .
- ▶ POISON-SFENCE Then,  $i = \text{sfence}_{>sc}$ ,  $t_1 \stackrel{\bullet}{\dots} v_1$ , and  $s_1 \stackrel{\bullet}{\dots} u_1$ , and  $|T_1| = |S_1| = |S_2| = |T_2| = 1$ . Further  $s_2 \stackrel{\bullet}{\dots} u_2$ .  $|T_2| = 1$  justifies  $t_2 \stackrel{\bullet}{\dots} v_2$ . That suffices for  $(S_2, T_2, ps) \stackrel{\bullet}{\dots} (U_2, V_2, ps')$ .
- ▶ POISON-SLH Then,  $i = \sinh a_{sc}$ ,  $T_1 \stackrel{\bullet : \bullet}{\longrightarrow} V_1$ , and  $S_1 \stackrel{\bullet : \bullet}{\longrightarrow} U_1$ . Further  $S_2 \stackrel{\bullet : \bullet}{\longrightarrow} U_2$ . Semantics yield an appropriate  $V_2$  with  $T_2 \stackrel{\bullet : \bullet}{\longrightarrow} V_2$ . That suffices for  $(S_2, T_2, ps) \stackrel{\bullet : \bullet : : \bullet : \bullet}{\longrightarrow} (U_2, V_2, ps')$ .
- ▶ POISON-ROLLBACK Then,  $T_1 \xrightarrow{\text{RB:RB}} T_1' = V_1$ , and  $S_1 \xrightarrow{\text{RB:RB}} S_1' = U_1$ . Further  $S_2 \xrightarrow{\text{RB:RB}} S_2' = U_2$ . Semantics yield  $T_2 \xrightarrow{\text{RB:RB}} T_2' = V_2$ . That suffices for  $(S_2, T_2, P.ps) \xrightarrow{\text{RB:RB:RB:RB:RB:RB:RB}} (U_2, V_2, P)$ .
- ▶ POISON-FILL, POISON-SPILL, POISON-MOVE, POISON-SHUFFLE-SFENCE, and POISON-SHUFFLE-SLH Not applicable due to definition of  $\prec$ ,  $T_1 \prec S_1$  makes it impossible for  $T_1$  to be at a shuffle program counter.

#### Lemma 11

PROOF. We do the induction step by case distinction on  $V_1 \xrightarrow{\bullet:\lambda} X_1$ . Let  $i = P V_1 = P V_2$ .

- ▶ SLH and MOVE Then,  $\lambda = \bullet$  and there is  $V_2 \xrightarrow{\bullet : \bullet} X_2$  by definition of semantics.
- ► FILL Then,  $\lambda = LD$  | and there is  $V_2 \xrightarrow{\bullet:LD} X_2$  by definition of semantics.
- ▶ SPILL Then,  $\lambda = \text{st } I$  and there is  $V_2 \xrightarrow{\bullet : \text{st } I} X_2$  by definition of semantics.
- ▶ SFENCE Then,  $|V_1| = 1 = |V_2|$ . By definition,  $V_2 \xrightarrow{\bullet:\bullet} X_2$ .

Received 2024-07-11; accepted 2024-11-07