# A New Secure Memory System for Efficient Data Protection and Access Pattern Obfuscation

Haoran Geng, Yuezhi Che, Aaron Dingler, Michael Niemier, Xiaobo Sharon Hu

Department of Computer Science and Engineering

University of Notre Dame, Notre Dame, IN, USA, 46556

E-mail: hgeng@nd.edu, yche@nd.edu, adingler@nd.edu, mniemier@nd.edu, shu@nd.edu

Abstract—As the reliance on secure memory environments permeates across applications, memory encryption is used to ensure memory security. However, most effective encryption schemes, such as the widely used AES-CTR, inherently introduce extra overheads, including those associated with counter storage and version number integrity checks. Moreover, encryption only protects data content, and it does not fully address the memory access pattern leakage. While Oblivious RAM (ORAM) aims to obscure these patterns, its high performance costs hinder practical applications. We introduce Secure Scattered Memory (SSM), an efficient scheme provides a comprehensive security solution that preserves the confidentiality of data content without traditional encryption, protects access patterns, and enables efficient integrity verification. Moving away from traditional encryption-centric methods, SSM offers a fresh approach to protecting data content while eliminating counter-induced overheads. Moreover, SSM is designed to inherently obscure memory access patterns, thereby significantly enhancing the confidentiality of memory data. In addition, SSM incorporates lightweight, thus integrated mechanisms for integrity assurance, protecting against data tampering. We also introduce SSM+, an extension that adapts Path ORAM to offer even greater security guarantees for both data content and memory access patterns, demonstrating its flexibility and efficiency. Experimental results show that SSM incurs only a 10% performance overhead compared to nonprotected memory and offers a 15% improvement over AES-CTR mode memory protection. Notably, SSM+ provides an 20% improvement against Path ORAM integrated with Intel SGX under the highest security guarantees.

## I. INTRODUCTION

Securing sensitive and private data is crucial for many applications, especially when a vast amount of personal data needs to be outsourced to the cloud. Machine Learning as a Service (MLaaS) serves as a pertinent example, as it necessitates the remote collection, storage, and processing of considerable personal data remotely for model training purposes. Therefore, a significant concern is the potential exposure or misuse of such data by compromised or malicious servers.

To effectively safeguard sensitive data, it is imperative to focus on both data content protection and access pattern protection. For instance, when healthcare organizations employ MLaaS for predictive analytics on patient data, ensuring the confidentiality of data content is critical. A security breach could lead to the exposure of sensitive health information, violating privacy and potentially causing harm. Additionally, safeguarding against access pattern leakage is also important, as sensitive information can be revealed even when encrypted [41], [42], [57]. For example, an attacker might observe access

patterns between healthcare organizations and the server, and could potentially infer the type and severity of a disease by examining the accessed addresses and frequencies [11].

In response to the pressing need for robust memory security, leading companies such as Intel and AMD have adopted the Advanced Encryption Standard Counter Mode (AES-CTR) as their preferred memory encryption scheme [18] to safeguard data content. In the AES-CTR approach, a unique counter, derived from the physical address of data combined with an incrementally updated version number (VN), is encrypted using AES. This encrypted counter, serving as a one-time pad (OTP), is then XORed with the plaintext data to produce the ciphertext for memory storage. Therefore, AES-CTR enables the pre-computation of encryption keystreams, facilitating parallel processing and reducing the effective encryption and decryption latency. However, a pivotal challenge arises from the necessity of storing VNs in memory and maintaining their integrity. Without intact VNs, data cannot be decrypted accurately. Moreover, to ensure the integrity of VNs, additional mechanisms such as Message Authentication Codes (MACs) or hash functions are typically required, often coupled with Merkle tree traversal [55]. This not only increases storage requirements but also incurs considerable latency due to the complex verification process involved in each memory access. The traversal of a Merkle tree, necessary for each VN check, adds multiple memory accesses to the critical path, significantly impacting overall system performance.

While AES-CTR addresses the encryption of data content, protecting against access pattern leakage requires a different set of strategies. Oblivious RAM (ORAM) [16], [17] is one of the most effective primitives for eliminating access pattern leakage. ORAM works by obfuscating the access patterns to memory, such that each access appears random and computationally indistinguishable from any other. This is achieved through a combination of redundant data accesses and the periodic reshuffling of memory blocks, ensuring that the true nature of any data access is concealed. However, the practicality of ORAM is hindered by its significant performance overhead. For example, Path ORAM [53], one of the most prevalent ORAM constructions, requires half of the storage capacity for dummy blocks. Moreover, each memory access in Path ORAM is translated into a full path of reads and writes across the memory, with the vast majority being redundant, resulting in a performance degradation of several orders of magnitude compared to non-ORAM memory access. While various optimizations and alternative ORAM constructions have been proposed [5], [8], they generally suffer from the same fundamental trade-off between security and performance.

In this paper, we propose Secure Scattered Memory (SSM), a novel memory protection scheme that diverges from traditional encryption, while also inherently hides access patterns. The key insight of SSM involves decomposing original data into multiple secret shares which are individually stored across the memory. Reading a data then requires assembling multiple shares to reconstruct the original plaintext. Consequently, each memory block holds multiple shares that are cryptographically secure in isolation, rendering it useless for potential attackers without the complete shares for reconstruction.

The SSM scheme not only protects data content but also serves as an inherent mechanism for data integrity checks. Since any single share is insufficient to reconstruct the original data, the integrity of the data can be implicitly checked by verifying the integrity of the individual shares. Moreover, to ensure the robustness of SSM, it is vital to obscure the access patterns to the multiple shares, preventing any direct reassembly attempts by attackers. As such, SSM incorporates access pattern protection, not only to obscure the association between shares and their original data, but also to shield against memory access pattern attacks.

In designing SSM, we are confronted by two primary challenges: First, the intrinsic nature of SSM, which segments data into multiple shares, essentially increases the read/write operations, thereby adding to the memory access overhead. To mitigate this issue, we implement techniques such as partial cache to reduce the overheads. Second, efficient access pattern protection is necessary to obscure the linkage between data shares and their original form, thereby providing protection against memory access pattern attacks. While techniques like ORAM offer comprehensive protection, they come with significant performance overhead, making direct adoption a potential source of substantial performance loss. The primary aim of SSM's access protection scheme is to prevent attackers from reconstructing data shares. By not striving to meet all of ORAM's extensive security criteria, SSM employs three key methods to obscure each access's identity: read dummies, random shuffling of data blocks and partial caching. These approaches ensure effective access pattern protection, maintaining share security without the complexity and overhead of traditional ORAM schemes. For scenarios requiring the utmost security level, SSM can integrate ORAM protocols, transitioning to SSM+, to balance security and performance according to specific needs.

The main contributions of SSM are highlighted below:

- We introduce SSM, a novel memory protection scheme that ensures data protection, integrity verification and memory access pattern protection.
- We devise data segmentation and reconstruction schemes in SSM, which reduces the extra overhead caused by traditional memory protection methods.



Fig. 1: AES-CTR memory protection scheme

- We analyze the SSM's capabilities for memory protection, with a particular focus on data protection, integrity verification, and memory access pattern protection, employing probabilistic methods to assess the robustness of SSM.
- We implemented SSM and related comparative studies in Gem5 and conducted experiments. The results show that SSM improves performance by 15% compared to Intel SGX.
- To achieve the highest level of security guarantees on both data content and memory access pattern obfuscation, adopt the Path ORAM scheme within SSM (referred to as SSM+) and achieve an average of 20% improvement in performance compared to the combination of Path ORAM and Intel SGX.

# II. BACKGROUND

This section first introduces memory security (§II-A), highlighting the importance of encrypting data content and obfuscating memory access patterns. We then discuss secret sharing (§II-B), which plays a critical role in shaping SSM.

## A. Memory Security

When it comes to addressing memory security challenges, two aspects demand the most attention: (1) encryption of the actual data residing in the memory, ensuring its confidentiality, and (2) the meticulous management of memory access patterns, preventing potential information leakage. In the following, we review the predominant techniques for encrypting memory data and introduce memory access patterns protection.

1) **Memory Encryption**: Memory encryption techniques use symmetric key encryption methods, notably AES in counter mode (AES-CTR) and XOR–encrypt–XOR-based tweaked-codebook mode with ciphertext stealing (AES-XTS), to ensure the confidentiality of off-chip data [18], [23]. These methods are also combined with the Message Authentication Codes (MAC) and Merkle Tree for data integrity verification [19].

*a)* AES-CTR: As illustrated in Fig. 1, AES-CTR involves a counter which combines the data block's physical address

(PA) with a progressively incremented version number (VN) [24], [56]. This relationship is captured by the equation:

# $Ciphertext = Plaintext \oplus AES\_Enc(PA||VN)$

Where  $\|, \oplus$  and AES\_Enc denote bit-wise concatenation, XOR operations and AES encryption, respectively. AES\_Enc( $PA \| VN$ ) generates a one-time pad (OTP). Leading semiconductor companies like Intel and AMD have incorporated specialized AES-CTR engines within their products, enabling operating systems (OS) to selectively encrypt memory regions in a manner that is transparent to applications. As emphasized in [1], [26] and [36], this strategy is popular among cloud providers to protect virtual machines (VMs) and in-memory content.

One primary challenge with AES-CTR memory protection is guaranteeing the distinctness of the VN for every cache block. To avoid repeating counter values, it is imperative to modify the AES key when the VN approaches its threshold. Consequently, a sizable VN storage space is vital to minimize re-encryption intervals. Intense read and write activities necessitate many VNs, leading to gigabyte-scale VN storage requirements [18]. In a secure processor setup, VNs are predominantly stored within DRAM.

Storing VNs in off-chip DRAM introduces another challenge: safeguarding their integrity and ensuring VN freshness. MAC are used to verify VN authenticity. The process verifies that the value pulled from DRAM aligns with the most recent entry in the processor. When a data block is written, a MAC—comprising the data, its PA, and VN—is created and linked to each data block. For encrypted blocks, the MAC is formulated as:

#### MAC = Hash(Ciphertext || CTR)

MACs are then cross-referenced during DRAM reads. However, a MAC check alone is not foolproof. For example, replay attacks, can subtly insert outdated data, along with its VN and MAC, bypassing detection mechanisms. A Merkle tree [55] is the solution to this vulnerability, offering a stratified MAC verification approach, with its root anchored securely on-chip.

b) AES-XTS: AES-XTS presents a notable advantage over AES-CTR for memory protection by eliminating the use of VNs, thereby effectively sidestepping the overhead challenges seen in AES-CTR [38]. Specifically, AES-XTS operates using a two-key system: one key facilitates block cipher encryption while the other acts as a tweak, generated from the PAs of the data block. This tweak undergoes specific transformations and collaborates with the block cipher encryption to secure the data. However, a significant drawback of AES-XTS is its susceptibility to ciphertext sidechannel attacks, inherently rendering it less secure than AES-CTR [33], [58]. Moreover, the latest iteration of Intel SGX-Scalable leverages XTS-mode encryption but omits any form of integrity verification, further compromising its security credentials [62]. Given these concerns and the less maturity of AES-XTS, we prioritize discussions around the more mature and robust AES-CTR approach.

2) Access Pattern Protection: While memory encryption is pivotal in protecting data content, it alone is not enough to ensure complete memory confidentiality. For example, even when memory is encrypted, it is still vulnerable to attacks through access pattern leakage. In contemporary memory architectures, the address bus, instruction bus, and data bus are separated. This structural design introduces a problem where even if all data content is encrypted on the data bus, the address and command buses remain exposed. Consequently, an adversary can exploit these buses as a side-channel, directly snooping and monitoring the access trace within the memory, which allows them to discern access pattern parameters such as address, frequency, and the type of operations (either read or write). Prior research has demonstrated that such access pattern leakages can leak sensitive information, including sensitive control flow variables [63], private database queried keywords and search terms [27], [35], RSA keys [29], and neural networks architecture [9], [25].

To mitigate or eliminate access pattern leakage, an effective solution is Oblivious RAM (ORAM) [16], [17]. Specifically, ORAM is designed with five primary protection objectives to ensure a robust security model for memory access operations [53]. These objectives aim to prevent leakage of: (1) Access addresses: the identity of accessed data; (2) Access frequency: the age or frequency of data accesses; (3) Access linkablity or dependency: data linkage (the possibility of linking multiple accesses to the same data); (4) Access pattern: the specific pattern of accesses, whether sequential, random, etc.; and (5) Operation type: the operation of the access, i.e., read or write. We categorizes these critical aspects as the five foundational protection features of ORAM.

The core principle of ORAM is to make any two memory accesses appear computationally indistinguishable. This obfuscation is achieved by introducing a significant amount of redundant data accesses and randomized reshuffling. We take the widely adopted Path ORAM [53] as an example, Path ORAM employs a binary tree structure for untrusted memory, with each node on the tree being a data 'bucket' that can hold both real and dummy encrypted data blocks. The height of the tree is L, and there are Z data blocks per bucket. The trusted client side has a position map that confidentially correlates data blocks to a specific tree path, and has a stash for temporarily storing blocks during process. To retrieve or update a block, Path ORAM consults the position map first, then reads the entire relevant path with  $L \times Z$  data blocks into the stash. Once the required data block is processed, Path ORAM re-encrypts and remaps the block to a new path, while the remaining blocks are written back along their original paths in a random manner. Therefore, by remapping accessed data blocks to new paths and interspersing them with dummy blocks, Path ORAM effectively conceals the original access patterns.

# B. Secret Sharing

Secret sharing is a widely studied protocol in the realm of cryptography and data security, with numerous applications in diverse fields [2]. One of the key features of secret sharing that can be leveraged for memory protection is the concept of dividing data into multiple shares. This feature allows for the protection of data in memory without the need for countermode encryption, thereby offering a novel approach to data security. In the following, we introduce the data separation and reconstruction of the most widely used secret sharing scheme, Shamir's Secret Sharing (SSS), and discuss its security.

1) Shamir's Secret Sharing: In the SSS scheme, a secret s is transformed into a polynomial of degree k-1. This polynomial is represented as  $f(x) = s+a_1x+a_2x^2+\ldots+a_{k-1}x^{k-1}$ , where coefficients  $a_i$  are randomly chosen. Shares of the secret s are then generated from various points on this polynomial curve, specifically by evaluating the polynomial at distinct values of x. It's noteworthy that while the polynomial contains the secret at its y-intercept, the shares are derived from the other points on the curve [12], [21]. To reconstruct the secret s, a minimum k shares are required, utilizing polynomial interpolation such as Langrange interpolation or Barycentric interpolation [3]. What sets Shamir's Secret Sharing apart is its ability to recover the secret from a subset of the total shares, offering efficiency in large systems. We adapt the concept of SSS in designing SSM.

2) Security of Secret Sharing: The security of secret sharing protocols is well-studied, making them a reliable choice for preserving privacy in memory systems. The fundamental principle of security in secret sharing protocols is that an attacker would need to obtain all the shares (or at least a significant portion of them) to retrieve the original data. In SSS, if less than k shares are known, nothing can be known about the secret [21]. This property makes secret sharing a secure method for preserving privacy in distributed systems. Recent studies have further verified the security of secret sharing protocols and demonstrated their effectiveness in various contexts, including quantum communications and public blockchains [37], [40], [59], [60].

#### III. MOTIVATION

Below, we first examine the limitations of current memory encryption and access pattern protection methods. We further discuss state-of-the-art solutions, their merits and detriments.

#### A. Challenges with AES-CTR's Security

AES-CTR enables precomputation of encryption keys through the amalgamation of VNs with physical addresses for parallel processing and reduces latency via XOR operations. However, it incurs significant overheads due to the need for integrity checks for VNs [24], [56]. Memory accesses can increase by at least 23% [24]. For memory-intensive applications, such as DNN inference, the increase is over 36%. For DNN training sees a surge of 40% [56]. A large portion of this overhead can be attributed to the Merkle tree, which is used for VN integrity.

MGX [24] and SoftVN [56] represent state-of-the-art solutions for addressing the overheads associated with extensive VNs integrity checks during memory access. MGX provides memory protection tailored for hardware accelerators by leveraging their predictable memory access patterns. Rather than storing VNs off-chip, MGX dynamically generates VNs using on-chip accelerator states, refining integrity checks to align with the accelerator's operational characteristics.

SoftVN introduces a distinct approach that allows software to provide VNs for memory accesses. It is particularly suited for memory-intensive applications with straightforward access patterns, allowing it to bypass cache-level VN tracking and the overheads from off-chip VN fetches.

While both MGX and SoftVN negate the need to fetch VNs from memory by either generating them dynamically or obtaining them from software, they do face several challenges:

**1. Application Specificity:** Both MGX and SoftVN are application-specific by design. MGX requires specialized application accelerators to internally derive the VNs during operations. SoftVN depends on certain software parameters from the given application to serve as VNs, which limits their utility in more general scenarios.

2. Hardware Limitations: Although SoftVN is intended for general-purpose CPUs, it demands significant architectural changes. The integration of a VN table, along with an additional cache to counter unpredictable cache evictions during VN generation, results in increased latency. While MGX operates with reduced overhead, its dependence on dedicated accelerators limits its versatility and adaptability in diverse computing environments

**3.** Access Pattern Challenges: Both MGX and SoftVN rely on deterministic access patterns, which stands in contrast to contemporary access pattern protections which favor random memory accesses. When considering the combination of straightforward ORAM solutions, which require random access patterns, building a protection mechanism based on MGX or SoftVN would undoubtedly introduce much more overhead.

## B. Challenges of ORAM

As mentioned in the previous section (§II-A2), ORAM can be a promising primitive for protecting access patterns. However, implementing ORAM can lead to significant performance degradation. Take the widely-adopted Path ORAM as an example, the mechanism that Path ORAM employs to obfuscate access patterns exacerbates the performance overhead. Each original memory access is transformed into a sequence of reads and writes, with only one access being actually required. Thus, the majority of Path ORAM accesses are redundant, with the redundancy factor often reaching even two orders of magnitude. While various state-of-the-art ORAM designs and optimizations have been proposed, they predominantly revolve around the same core principle as Path ORAM: leveraging extensive random accesses to mask the genuine access pattern. Moreover, current ORAM works employ the traditional memory encryption schemes, but none of them take the extra overhead of encryption such as AES-CTR storage and integrity checking into account, which may push the ORAM implementation further from being practical.

TABLE I: Security Guarantees Across Different Designs

|                                 | Intel SGX | MGX | SoftVN | Intel SGX +<br>Path Oram | SSM | SSM+ |  |
|---------------------------------|-----------|-----|--------|--------------------------|-----|------|--|
| Data Content Protection         |           |     |        |                          |     |      |  |
| Confidentiality Breach          |           |     |        |                          |     |      |  |
| Authentication Attack           |           |     |        |                          |     |      |  |
| Replay Attack                   |           |     |        | •                        |     |      |  |
| Access Pattern Protection on    |           |     |        |                          |     |      |  |
| Access Addresses                | •         | •   | •      |                          |     |      |  |
| Access Frequency                | •         | •   |        |                          |     |      |  |
| Access Linkablity or dependency | •         | •   |        | •                        | •   |      |  |
| Access Pattern Type             | •         | •   | •      | •                        |     |      |  |
| Operation Type                  | •         | •   |        | •                        | •   |      |  |
| full protection,                |           |     |        |                          |     |      |  |

# C. Security Guarantees Across Different Designs

The objective of SSM is to introduce a new secure memory system that encompasses data encryption while inherently offering access pattern protection. Consequently, the security considerations for SSM diverge from those of Intel SGX, MGX, SoftVN and Path ORAM. Table I summarize qualitatively the unique security guarantees provided by each design, emphasizing their designs to countering attacks on data content and access patterns. Intel SGX, MGX, and SoftVN focus on data content protection through AES-CTR encryption and data authentication mechanisms, yet they do not address memory access pattern protection.

ORAM is typically implemented in secure memory systems to ensure the highest level of security guarantees. In our analysis, we consider combining Path ORAM and Intel SGX. This integration offers comprehensive security coverage, achieving full protection on data content and access pattern.

Attempting to combine MGX or SoftVN with Path ORAM for enhanced security encounters a significant hurdle. The deterministic access patterns integral to MGX and SoftVN's counter generation conflict with Path ORAM's reliance on random access patterns for safeguarding security. This discrepancy necessitates a fundamental redesign of their counter generation processes, presenting a notable challenge in achieving a unified security framework.

SSM's security model aims to safeguard data content, including robust authentication and replay attack defenses. It also provides protection against identity leakage of accessed data, as detailed in §IV-E. While SSM partially addresses the leakage of data age, data linkage, and access patterns, for the highest level of access pattern protection, SSM+ amplifies the SSM framework to equal the protective strength of Intel SGX and Path ORAM combined.

#### D. Our Motivation

Our work aims to tackle the challenges facing current memory protection solutions. Specifically, our motivation is to design a low-overhead memory protection scheme for general purpose architecture that encompasses three key facets: protecting the content stored in memory, safeguarding memory access patterns, and ensuring data integrity checks.

# IV. SECURE SCATTERED MEMORY ARCHITECTURE

With SSM, instead of storing complete data, data is divided into secret shares, fragments that are stored in memory. This

TABLE II: List of parameters used in SSM.

| Symbol | Definition                                |
|--------|-------------------------------------------|
| N      | Polynomial degree                         |
| K      | Total numbers of shares of one data block |
|        | generated from data segmentation process  |
| S      | The numbers of shares in each data block  |
| W      | he number of polynomial coefficients      |
|        | into which the original data is divided   |
| t      | Minimal number of shares                  |
|        | required to reconstruct the original data |
| d      | The number of dummy values read in each   |
|        | read to blur the access pattern           |
| C(K,t) | The number of t combinations of           |
|        | a set with K elements                     |

strategy ensures that each memory line holds only a randomized piece, offering minimal value to attackers without access to the complete set of shares. SSM comprehensively addresses memory content safety, access pattern obscurity, and data integrity. Here, we describe the SSM framework, with Table II listing key parameters that will be frequently referenced in subsequent sections. We begin by outlining SSM deisgn challenges associated then discuss how the SSM architecture overcomes such challenges.

# A. SSM Challenges

**1.Providing shares access protection:** Achieving memory access obfuscation is a pivotal aspect of ensuring that the shares can not be reconstructed by analysing the access patterns. While directly adopting the well-studied ORAM architecture is a straightforward solution for this purpose, it introduces significant performance overheads. As highlighted in previous discussions (§II-A2), ORAM is distinguished by its five protection goals aimed at comprehensive memory access security. However, the primary objective of SSM's access protection scheme is to ensure that data shares cannot be reconstructed by attackers. This goal allows SSM to circumvent the need to meet all of ORAM's extensive security criteria.

In navigating the trade-off between performance and security, SSM deploys specialized access pattern protection techniques. These techniques are based on the principles of secret sharing, offering a tailored approach to security that is both efficient and effective, as detailed in §IV-E. For scenarios demanding the highest level of security, where fulfilling all five ORAM goals is essential, SSM can readily adopt ORAM protocols, evolving into SSM+. This enhanced mode, SSM+, provides protection for both data content and access patterns but incurs additional performance overheads. The tradeoffs between the original SSM framework and the ORAMintegrated SSM+ mode, including a detailed comparison and analysis of these methodologies, are thoroughly examined in §VII.

2. Mitigating read/write overheads: A defining characteristic of SSM is its division of a single data block into multiple shares, which inevitably increases the number of read and write operations. This naturally adds overheads. Our response to this challenge is twofold. First, the increased read and write overhead inherent in SSM can be mitigated through systemlevel optimizations such as Out-of-order (OoO) execution.



Fig. 2: The high-level overview of (a) SSM design, (b) data segmentation process, (c) data reconstruction process, (d) access pattern protection.

Additionally, the access pattern hidden techniques such as partial cache are used to reduce the overheads. We explore the specifics of these schemes in §IV-E, to further decrease the frequency of read/write operations, thus reducing the overhead of SSM.

Efficiently integrating data segmentation and reconstruction within the SSM architecture is also important. We've developed dedicated mechanisms for these tasks (§IV-C, §IV-D), ensuring they operate seamlessly within the memory system. These tailored solutions are essential for maintaining the efficiency and integrity of SSM.

## B. SSM Overview

Fig. 2(a) depicts a high-level overview of SSM. The SSM interface consists of a SSM stash, a SSM map and components for data segmentation and data reconstruction. SSM is integrated within the memory controller:

SSM map: The SSM map serves the crucial function of recording the locations of data shares scattered across memory, operating through a large mapping table that facilitates a 1-to-K mapping. This mapping effectively translates each original address request into a corresponding set of share addresses. Compare with the 1-to-1 address mapping found in traditional TLB, SSM's 1-to-K mapping increases the mapping table's size. To mitigate this, we employ two techniques. Firstly, we adopt a hierarchical storage approach similar to the position map in Path ORAM [47], enabling recursive storage across memory for a more compact mapping table. Secondly, we utilize a computation-based address mapping technique that leverages hashing to map source address to destination addresses [34], significantly reducing the need to store full destination addresses directly in the mapping table. These strategies ensure the mapping table's size is efficiently managed, setting it at a scale of megabytes.

**SSM stash:** In SSM, 'dummy values' are actually shares from other data, employed to obscure memory access patterns and not fitting the conventional definition of 'dummy.' Unlike ORAM, where true dummy values, generated by a PRNG, occupy half the memory—thereby increasing storage overhead—SSM avoids such costs (§IV-E). By repurposing other shares as 'dummy values,' SSM significantly reduces storage overhead compared to Path ORAM. The SSM stash serves as a share cache, storing these values to facilitate pre-fetching. Upon address mapping, SSM first assesses the presence of shares in the stash; if absent, it then requests memory from the system. The impact of stash size on system performance, detailed in §VII, underscores its efficiency and time-saving role.

**Data segmentation/reconstruction Units:** These units are responsible for breaking down data blocks into secure shares upon write operations and reconstructing the original data from the shares during read operations.

For each read or write operation, the SSM retrieves t data blocks, which represent the minimum number of shares needed to reconstruct the original data block. In an effort to obfuscate access patterns, an additional d dummy data blocks are read alongside the t blocks, resulting in a total of t + d blocks per access. These blocks are temporarily stored in the SSM stash. Each share within these accessed data blocks undergoes a shuffling process, simultaneously updating the SSM map to reflect the new share locations. The operation concludes with these data blocks being written back to memory, ensuring the continued protection of data integrity and privacy.

# C. Data Segmentation in SSM

The data segmentation process, illustrated in Fig. 2 (b), is initiated when writing to memory. Its primary objective is to partition the data into multiple shares and store each of these shares in separate memory locations. The data segmentation procedure consists of two phases: (1) the polynomial generation phase and (2) the shares generation phase.

In the phase (1), a polynomial of degree N is formulated. Initially, the original data block is divided into segments  $p_1$  to  $p_W$ . The corresponding polynomial is represented as :

$$p_1 + p_2 \times x + \dots + p_W \times x^W + a_1 \times x^{W+1} \dots + a_{N-W} \times x^N$$

where the segments  $p_1$  to  $p_W$  act as the coefficients for the polynomial from degree zero to degree W. A sequence of coefficients  $a_1, a_2, \ldots, a_{N-W}$  is generated from a combination of two sources: a pseudo-random number generator (PRNG) and securely stored coefficient seeds. Specifically, the coefficients derived from the coefficient seeds, is positioned at certain degrees to facilitate integrity verification.

During the phase (2), random values ranging from  $x_1$  to  $x_K$  are produced using the PRNG. For each value, the polynomial is evaluated to determine its corresponding f(x). Each resulting pair, e.g.  $(x_1, f(x_1))$ , is referred to as a share. Using this method, a total of K shares are generated for the polynomial of each block, and they are subsequently stored in memory. With this, the data segmentation procedure for memory write operations is concluded.

Each share contains a fragment of the plaintext, specifically corresponding to the initial W degrees of the polynomial. Thus, during data reconstruction, each retrieval operation must be able to recover W segments of the original data.

#### D. Data Reconstruction

Per Fig. 2 (c), the data reconstruction process begins during memory reading. The original data is reconstructed from a subset of its shares and an integrity check is performed. The data reconstruction process encompasses two phases: the polynomial reconstruction phase and the integrity verification phase.

During each read operation in SSM, t + d shares are retrieved, employing t shares for polynomial reconstruction and d shares to obscure the access pattern. A total of K shares are stored for each original data block. To reconstruct the original polynomial, a minimum of t shares is necessary. An efficient and precise polynomial interpolation method is crucial for accurate data recovery. We use Barycentric interpolation [3], recognized for its precision. For shares  $\{(x_1, f(x_1)), (x_2, f(x_2)), \ldots, (x_t, f(x_t))\}$ , the polynomial is reconstructed as:

$$f(x) = \frac{\sum_{i=1}^{t} \left(\frac{w_i}{x - x_i}\right) f(x_i)}{\sum_{i=1}^{t} \frac{w_i}{x - x_i}},$$
(1)

where  $w_i$  are the Barycentric weights.

In the integrity verification phase, coefficient seeds are utilized. The process entails checking each polynomial coefficient, from  $a_1$  to  $a_{N-W}$ , against pre-established seeds at specific degrees. For example, as illustrated in Fig. 2(b), the coefficient  $a_1$  at degree W + 1 is derived from the coefficient seeds. Consequently, in Fig. 2(c), we verify the coefficient at degree W + 1 to determine if it corresponds to the respective seed for integrity verification. Matching coefficients and seeds shows a successful integrity check. In contrast, any deviation of a coefficient from its seed implies the data may have been compromised, suggesting potential tampering.

# E. Access Pattern Protection

Access pattern protection in SSM is a critical component for ensuring data confidentiality of memory. Central to the security guarantees of SSM is the obfuscation of access patterns to multiple shares, which is essential to thwart unauthorized attempts for data reconstruction. We leverage the inherent properties of SSM design and introduce three key schemes: (1) combination selection, (2) shares shuffling, and (3) partial cache. With these schemes combined, SSM can provide sufficient obfuscation while minimizing performance overhead.

1) Combination Selection: As shown in Fig. 2 (d), the first step of access pattern protection is combination selection. In SSM, each data block contains multiple shares from different sources. As outlined in (§IV-D), to retrieve and reconstruct any particular data block, SSM is designed to gather a predefined number of these shares, denoted by t out of a total of K shares associated with each original data block. Thus, we leverage the combinatorial principle which dictates that there are C(K, t) combinations of shares available for the reconstruction of a single data block. Consequently, a significant layer of obscurity is added to the access patterns, which makes the probability of correctly guessing the actual pattern to be  $\frac{1}{C(K,t)}$ . To enhance the security of combination selection, t is typically set to approximately half of K, as this value maximizes C(K,t). For example, when K = 32and t = 16, there are over 600 million combinations, and difficulty in tracking which shares are being accesses increases exponentially.

2) Shares Shuffling: While the combination selection step help to obfuscate the access patterns, it is still not sufficient. An attacker that repeatedly accesses the same data block, eliminating combinationial possibilities and threat the system. Our shares shuffling techniques can mitigate this possibility.

In SSM, each data block is composed of multiple shares segmented from various source data. Each share represents a pair (x, f(x)) derived from a corresponding polynomial. The SSM map translates a single request address to K memory locations, achieving a 1-to-K mapping. For each access, a minimum of t data blocks containing real shares, along with d dummy blocks, are fetched. This creates a shuffling pool of  $(t + d) \times S$  elements for share shuffling. The SSM map is updated after each shuffling. The frequency of shuffling is determined by the stash size. Data blocks stored as shuffle candidates in the stash are evicted to new locations before the stash becomes full, preventing stash overflow. For instance, if the stash size is 32KB, SSM performs share shuffling when the stash contains 24KB of data blocks.

3) Partial Cache: The partial cache strategy employs the SSM stash, an on-chip cache, that retains a subset of the data shares. The default size of SSM Stash is 32KB. This SSM stash serves two purposes. First, it reduces the latency for accessing shares that exhibit high temporal locality. Second, it enhances the pool of shares that can be utilized as dummy. By caching a proportion of accessed data blocks, SSM could fulfill data reconstruction requests with fewer than t real data blocks when there is a stash hit, which reduced reliance on real-time memory fetches allows for the insertion of dummy operations, further obfuscating the access pattern. This partial cache effectively improves the performance by prefetching

efficiency and enhances the security.

## V. THREAT MODEL

We consider a model where the main memory is the vulnerable victim, potentially exposed to a resourceful attacker. In this context, the memory bus acts as the main gateway for the attacker, allowing them to observe and, in certain scenarios, even directly interact with data transmitted through it. While the memory remains exposed, our model assumes that the CPU remains secured from external threats. Specifically, the attacker cannot retrieve data directly from the CPU's cache, ensuring a vital level of protection for instantaneously processing data. Furthermore, while our protection scheme's operational details are transparent and known to the attacker, the specifics of the data held within the CPU remain obfuscated.

In terms of the capabilities of the hypothetical attacker, several key actions can be performed. First, there is the confidentiality breach wherein the attacker can intercept and decipher the raw data traveling through the memory bus with the intent to extract valuable information [19], [62]. Second, the attacker can execute integrity and authentication attacks. Specifically, the attacker can not only they intercept, but also modify the data in transit, which affords the attacker the potential to inject malicious data or alter genuine data for nefarious purposes [39]. Third, the attacker has the capability of access pattern analysis. By observing the frequency and patterns of data accesses, the attacker may deduce specific system or user behaviors, which could then be exploited in more advanced attack strategies [9], [25]. Fourth, the attacker can initiate replay attacks, where the attacker can capture genuine data transactions and replay them at a later time, aiming to confuse the system or introduce outdated data to influence the system behavior [28], [61].

Note that our threat model does not take into consideration highly specialized, advanced attacks focusing on hardware vulnerabilities or sophisticated side-channel strategies. We believe that focusing on the aforementioned attack types provides a robust foundation to evaluate our memory protection scheme's effectiveness, striking a balance between realistic threats and theoretical extremes.

## VI. SSM SECURITY EVALUATION

In this section, we evaluate the security of SSM. We test the system against the treat model in V. First, we assess the system's resistance to confidentiality breaches to confirm that sensitive data remains undisclosed. Subsequently, we merge the analysis of integrity and authentication attacks, which tests the system's ability to prevent and detect data tampering. Next, we evaluate SSM's resilience to access pattern attacks, where patterns in memory usage could potentially be exploited to extract private information.

## A. Confidentiality Breach Analysis

The efficacy of SSM in preventing confidentiality breaches is based on its design to obfuscate and secure data within memory. In the event of a confidentiality breach attack where adversaries attempt to read sensitive data directly from memory, SSM increases the difficulty of such unauthorized access by dividing data into multiple shares.

Upon storage in an SSM system, each data block separate into K shares in memory, but only t shares are needed to reconstitute the original data. This means that any single memory block is of little value to an attacker, who must identify all shares to recover the original data. Moreover, SSM's memory access pattern protection ensures that attackers cannot determine which shares belong to a specific data block by monitoring access patterns, thereby it is hard for an attacker to gain meaningful information when attempts to read directly from memory.

Mathematically, the challenge for an attacker to discern the correct t shares within a vast memory system is significant. The probability  $P_1$  that an attacker guesses the required t shares out of the total shares is:

$$P_{1} = \frac{K \times C(\frac{\text{Total}}{K}, t)}{C(\text{Total}, t)}$$
(2)

where C(Total, t) denotes the binomial coefficient representing the number of ways to choose t shares from the total available shares in memory. C(Total/K, t) indicates the number of ways to pick t shares from the same data block, where Total/K is the number of shares per data block. The term  $C(\text{Total}/K, t) \times K$  denotes the number of ways to pick t shares from any one of the K different data blocks, each with a possibility of yielding the correct t shares. The difference between Total and t ensures that  $P_1$  remains small.

SSM demonstrates considerablt robustness against confidentiality breaches. In SSM, considering a memory size of 32GB with each share being 16B in size, results in a total of  $2 \times 10^9$ shares in memory. By setting t = 16 and K = 32, we obtain a probability of around  $2.65 \times 10^{-23}$ . This extremely low probability indicates that a brute force attack would require an impractically long time, potentially spanning years, to succeed. Furthermore, adjusting the values of K and t can further alter this probability, and the incorporation of a shuffling scheme (§IV-E) also enhances security.

# B. Integrity and Replay Attack Analysis

SSM provides a robust verification mechanism essential for ensuring the integrity of stored data and preventing both integrity and replay attacks. Integrity attacks involve unauthorized alterations of stored data, while replay attacks consist of the unauthorized resubmission of old data streams to create an unauthorized state or action.

SSM offers robustness against these attacks comparable to AES-CTR, which uses hash functions and MAC for integrity verification. In SSM, the integrity check can be considered a special hash, where the coefficient seeds function as the private key, and the reconstruction process is akin to MAC checking. Thus, SSM can provide the same level of robustness as AES-CTR, effectively safeguarding against both integrity and replay attacks through its unique data verification methods.

During reconstruction in SSM, the coefficients of the rebuilt polynomial are cross-checked against the stored coefficient seeds. The integrity check is straightforward: if the coefficients obtained during reconstruction match the stored seeds, the data is confirmed as authentic. Any discrepancy would immediately signal potential tampering with the data.

Regarding replay attack prevention, SSM's approach is equally robust. The seeding function serves as a cryptographic validator, requiring that any data to be reconstructed must precisely match its original encoded form. Replayed data would not pass this rigorous verification, as its coefficients would not align with the originally stored seeds, which are tied to the data's initial context and condition. To enhance security measures, SSM applies sequence counters to the seeds, further fortifying this defense and ensuring that replayed data is swiftly detected and rejected.

## C. Shares Access Protection Analysis

While data encryption methods like AES-CTR are designed to protect the content of data, they lack memory access pattern protection. SSM, on the other hand, ensures memory access pattern protection. To substantiate the robustness of the SSM's access pattern protection, we align with the standard security definitions for ORAM, which stipulate that a protection scheme must effectively conceal: 1) Data identity leakage-preventing the inference of accessed data; 2) Temporal locality leakage-obscuring when data was last accessed or its access frequency; 3) Spatial locality leakage-masking whether accesses are sequential or patterned. While the standard ORAM security model includes protection against disclosing the type of access operation (read or write), SSM's protection objective focuses on preventing the reconstruction of data from the locations of its shares. Our default SSM setting may relax the requirement to hide the type of access operation as it does not compromise the security of SSM. Moreover, safeguarding against access type disclosure in SSM is straightforward. In this process, each operation can be uniformly processed as both a read and a write, thereby preserving the uniformity of observed operations. However, enabling access type protection requires SSM to update all shares by re-generating K shares. This necessitates reading and writing K blocks, along with additional dummy blocks, leading to an increased number of read and write operations. As a result, the default SSM does not implement this type of protection. For enhanced security, we refer to the version of SSM with access type protection enabled as SSM Plus.

During each SSM access, the observable sequence is represented as  $y = (address_1, address_2, ..., address_{t+d})$ , comprising t real and d dummy addresses. The SSM Map offers a 1-to-K mapping, yields C(K, t) combinations for selecting t real shares, bolstering the security against location disclosure for the data identity leakage.

Each data block is meaningless before reconstruction, even without encryption. After access, SSM invokes the share shuffling scheme, randomly remapping each accessed share within the stash to a new location. The SSM stash accommodates a sequence set  $Y = \{y_1, y_2, \dots, y_n\}$ , with each y representing a sequence of addresses in each access that collectively contain  $(t+d) \times S \times n$  shares, where that n indicates the number of accesses before writing back to prevent stash overflow. The shuffling process instills a random remapping probability. If track independently with each share is equally likely to be in any position, the probability  $P_2$  of correctly guessing the original sequence before shuffling would be:

$$P_2 = \left(\frac{1}{(t+d) \times S \times n}\right)^n \tag{3}$$

Consequently, the sequence of accesses observed by an adversary is distinguishable at an extremely low probability from a random sequence, the overall probability P of reconstructing the original accessed data block would be equation  $\frac{1}{C(K,t)} \times P_2$  (3). Thus, SSM is adept at obfuscating both temporal and spatial locality, effectively safeguarding against attempts to decode the access pattern.

SSM's access protection scheme inherently offers protection against leakage of the age or frequency of data access, data linkage, and the specific patterns of access (see §II-A2). By employing a dynamic shuffling and remapping strategy for shares within the memory, SSM disrupts temporal and spatial patterns, making it challenging for adversaries to infer the timing, sequence, or nature of data accesses. This obfuscation minimizes the risk of temporal and spatial locality leakage, enhancing data operation privacy and security. A comparative analysis of SSM's effectiveness of access pattern protection against ORAM's comprehensive protection model is an area for future study.

# VII. EXPERIMENTAL RESULTS

In this section, we first show the performance overhead of SSM with different access overhead t + d. Then we examine how partial caching affects SSM's operational efficiency and we assess the impact of stash size variations on SSM's performance. Lastly, we present a comparative performance analysis of SSM against established secure memory systems such as Intel SGX [19], MGX [24], and SoftVN [56] utilizing different workloads to establish real-world relevance.

# A. Experiment Methodology

1) Implementation: For comprehensive evaluation, SSM, Path ORAM, and Intel SGX were implemented within the GEM5 cycle-level simulator [4]. Simulation setup is detailed in Table III, featuring a simulation environment of a singlecore X86 architecture with a 32GB DDR4 memory. The Intel SGX (implementing AES-CTR) was modeled with an 8-way Merkle tree, 56-bit VNs and MACs per 64-byte cache line, incorporating an AES encryption process with a latency of 40 cycles. In Path ORAM [47], memory utilization is fixed at 50%, with a 32KB stash size. The access overhead amounts to 108 ( $L \times Z$ ) data blocks per access.

SSM's data segmentation and reconstruction units were realized using Verilog RTL, evaluated for performance with a 45nm CMOS technology node [6] through Cadence Encounter.

| TABLE III: | Simulation | settings |
|------------|------------|----------|
|------------|------------|----------|

| Processor Parameters           |                             |  |  |  |
|--------------------------------|-----------------------------|--|--|--|
| Core                           | Single Core, X86, OoO, 3GHz |  |  |  |
| L1 Cache                       | 2 cycles, 32KB, 2-Way       |  |  |  |
| L2 Cache                       | 20 Cycles, 1MB, 8-Way       |  |  |  |
| LLC                            | 32 Cycles, 8MB, 16-Way      |  |  |  |
| Memory Parameters              |                             |  |  |  |
| Туре                           | DDR4_2400_16x4              |  |  |  |
| Size                           | 32 GB                       |  |  |  |
| Intel SGX Parameters [19]      |                             |  |  |  |
| AES Latency                    | 128-bit, 40 Cycles          |  |  |  |
| MAC/VN                         | 32 KB each, 4 ways          |  |  |  |
| Path ORAM Parameters [47]      |                             |  |  |  |
| Stash Size                     | 32 KB                       |  |  |  |
| Tree height                    | L = 27                      |  |  |  |
| Bucket size                    | Z = 4                       |  |  |  |
| Access Overhead $(L \times Z)$ | 108                         |  |  |  |
| SSM Parameters                 |                             |  |  |  |
| Data Segmentation latency      | 19 ns latency               |  |  |  |
| Data Reconstruction latency    | 60 ns latency               |  |  |  |
| SSM Stash Size                 | 32 KB                       |  |  |  |
| Access Overhead (t + d)        | 32 (default)                |  |  |  |
|                                |                             |  |  |  |

The latency values obtained from the RTL simulation were subsequently integrated into the GEM5 simulation framework. Notably, SSM, alongside Intel SGX and Path ORAM, were all implemented within the memory controller of the system in GEM5.

In our evaluations, we normalized execution time to the nonprotected (NP) system execution time represent the baseline at 1. This approach was chosen to highlight the additional overhead each design introduces compared to NP.

2) Parameter Setting in SSM: In our evaluation of SSM, we have chosen specific parameter settings to ensure both robust security and efficient data reconstruction. We set the threshold t to be half of the total number of shares K, i.e.,  $t = \frac{K}{2}$ . Consequently, the degree d of the polynomial used in secret sharing also becomes  $\frac{K}{2}$ .

The segmentation and reconstruction of shares within SSM depend on the original memory data block size and the chosen value of W. In our setup, W is set to 8. Given that each memory data block is 64B, this configuration leads to each share being 16B in size, comprising 8B for the x value and 8B for the corresponding f(x). As a result, within the SSM framework, each 64B memory block is divided into 4 such shares, where S = 4.

We also present SSM+, which replaces SSM shares access protection with Path ORAM to achieve all five ORAM access protection goals described in Section II-A2.Furthermore, SSM+ directly adopts the Path ORAM parameters. For each share access, it undergoes the full Path ORAM process, resulting in each share incurring an overhead of  $(L \times Z)$  data blocks per access.

## B. Performance Overhead of SSM

SSM inevitably increases read and write operations due to its security provisions. To evaluate the impact of these additional operations on system performance and explore how our optimizations might enhance efficiency, we (1) analysis the effects of partial caching on read/write efficiency, (2) evaluate



Fig. 3: Latency comparison for SSM and Non-Protected (NP) systems across different access overhead t+d values in random and sequential memory access patterns with and without SSM Stash (stash size = 32KB).

the influence of stash size, and (3) explore the trade-offs between t + d and performance metrics.

We deploy benchmarks specifically designed for memory operations, that reads and writes memory 100,000 times to consider worst-case scenarios, providing insights into the potential memory upper bounds on performance degradation for typical workloads. Benchmarks are executed under two distinct memory access patterns: sequential (Seq) and random (Rand). Sequential access patterns are commonly observed in convolutional neural networks (CNNs). In contrast, graphbased and recommendation algorithms frequently exhibit random memory access patterns.

1) SSM Accesses Overheads: The overhead associated with SSM primarily arises from two sources: the latency due to data segmentation and reconstruction, and the increased number of read and write operations required for each memory access. The impact of data segmentation and reconstruction on latency is relatively minor. Similar to the AES encryption steps in AES-CTR, the data segmentation and reconstruction steps in SSM can also benefit from pre-calculating the necessary computation steps. Additionally, these processes can be effectively managed through pipeline stages and OoO execution.

Conversely, the augmented read and write operations have a more substantial effect. This is illustrated in Fig. 3, which shows that SSM incurs additional latency compared to NP systems for both random and sequential memory access. As the t + d value increases, so does the number of read and write operations required by SSM for every CPU memory request. While OoO execution and parallel memory access can mitigate some of the latency introduced by SSM, this latency becomes more pronounced at higher t + d values. However, a greater t + d also correlates with enhanced security, making



Fig. 4: Execution time comparison between the Non-Protected (NP) system and SSM across different stash sizes for both sequential (Seq) and random (Rand) memory access patterns.

it more challenging for attackers to exploit memory through confidentiality breaches and memory access pattern attacks. Therefore, t + d presents a critical trade-off between latency and memory security.

2) Partial Caching: The introduction of partial caching within the SSM framework is a strategic approach to alleviating the inherent read and write overheads. As shown in Fig. 3, partial caching enhances performance, especially as the t + d value increases. This improvement is primarily due to the caching mechanism's ability to preload dummy data blocks, effectively obfuscating the true memory access patterns. Additionally, it reduces the number of data shuffles in SSM.

For lower t + d values, the latency reduction provided by partial caching is less noticeable. This is because the overhead from read and write operations is not as pronounced with smaller t+d, and the other SSM mechanisms are sufficient for maintaining performance. However, as the t + d value grows, the benefit of partial caching becomes increasingly significant. In the context of random memory access, the implementation of partial caching moderates the growth of the latency curve. It suggest that it effectively counters the exponential growth in read and write operations that larger t + d values would necessitate.

In the sequential memory access pattern, partial caching demonstrates a consistent latency advantage across all t + d values. This suggests that even when read and write operations are predictably ordered, the additional prefetched data can expedite the memory access process, thus diminishing the impact of the latency overhead.

3) Stash Size: Fig. 4 compares the latency of the SSM and a NP system under sequential and random memory access patterns for different stash sizes. With SSM's t + d being set to 32, the data suggests performance improvements as stash size increases, attributed to the SSM's prefetching mechanism.

For the random access pattern, SSM's advantage is more pronounced with a larger stash. The pre-loaded data's randomness aligns with the access pattern, allowing SSM to load more data per operation, thus improving performance. In contrast, for sequential access, the predictability of access patterns diminishes the benefit provided by SSM's random prefetch, leading to less performance improvements.



Fig. 5: Normalized execution time of Intel SGX, MGX, SoftVN, and SSM on different applications.

## C. Application Level Evaluation

In our evaluation of SSM's performance on applications, we have set t + d at 32 and configured a stash cache size of 32KB. This configuration is chosen to strike a balance between security and performance. To benchmark SSM, we selected four secure execution environments: Intel SGX [19], Path ORAM [47], MGX [24], and SoftVN [56]. We compare the performance on five benchmarks:

**Convolutional Neural Networks (CNNs):** CNNs are a cornerstone of image classification. For single ImageNet [13] inference and training, we utilize widely-used architectures like ResNet50 [22], VGG16 [52], and AlexNet [30]. Given their structured data flow, these benchmarks serve as representatives of sequential memory access patterns.

**Natural Language Processing (NLP):** NLPs are widely used to understanding and generating human language. In our benchmarks, we employ BERT [14], configured with a hidden size of 768, capable of handling sequences of up to 512 tokens, 12 attention heads, and four encoder blocks. The operational paradigm of BERT is in line with sequential memory access patterns, due to its ordered processing of token sequences.

**Recommendation Systems:** Recommendation systems are widely utilized in personalized content and product suggestions. Our simulations include DLRM [43] inference with 26 embedding tables, each of 128 dimensions, reflective of categorical features common in such systems. The inherent dependency of DLRM on embedding lookups and feature interactions leads to random memory access patterns.

Our evaluation of SSM, Path ORAM, and Intel SGX is based on GEM5. In contrast, performance data for MGX and SoftVN, as reported in their respective papers, are derived from SCALE-Sim [49] and ZSIM [50] simulations.

Fig. 5 displays the normalized execution time for various systems across different applications. To provide memory protection, all these designs, including SSM, incur overheads compared to NP systems, with SSM showing an average of 10% more execution time than NP. SSM exhibits enhanced performance compared to Intel SGX, primarily because its overhead originates from additional reads and writes, which are less resource-intensive than Intel SGX's reliance on VN and Merkle Trees (§II). On average, SSM achieves a 15% speedup against Intel SGX.

MGX and SoftVN, built on Intel SGX, offer nearly zero



Fig. 6: Normalized execution time of Intel SGX + Path PRAM and SSM+ on different applications.

overhead compared to the baseline NP. While SSM cannot surpass them in speed, it compensates with memory access pattern protection, a feature absent in the other systems. Besides, MGX requires a specialized accelerator, and SoftVN depends on specific software and hardware configurations. SSM, however, is more versatile, designed to integrate with a variety of system implementations, offering a broader application scope.

Fig. 6 compares SSM+ and Intel SGX + Path ORAM under full security guarantees of both access pattern and data content protection. Both designs inherently incur significant overhead due to their comprehensive security guarantees. Path ORAM's requirement for extensive read and write operations per access significantly increases overhead. Intel SGX + Path ORAM intensifies access demands through additional Merkle Tree node verifications. However, SSM+ optimizes performance by reducing memory operations, eliminating the MAC authentication and counter integrity checks present in Intel SGX, leading to an average of 20% performance improvement over Intel SGX + Path ORAM.

Our experiment also reveals a distinct difference in performance between inference and training tasks; the training phase exhibits a more substantial overhead on SSM due to the increased frequency of memory accesses required, underscoring the trade-offs between security measures and performance penalties in secure computing environments. For applications like DLRM, which involve random memory access, SSM incurs higher latency due to the added complexity of obfuscating access patterns to enhance security.

As a case study, we examine the linear layers of VGG-16 performing inference ImageNet. Fig 7 presents the normalized per-layer execution time for both Intel SGX and SSM. Notably, SSM demonstrates a marked improvement in memory-intensive layers. Particularly in layers with high memory transaction rates, such as the last three FC layers, SSM's streamlined operations significantly mitigate the performance overhead compared to Intel SGX. This advantage is less pronounced in layers with lower data transfer demands, like the convolutional layers, where the overhead for both systems is comparably minimal. The results underscore SSM's efficacy in reducing computational overhead in memory-bound operations, which is critical for secure execution environments in data-intensive tasks. The case study also reveals that the



Fig. 7: Normalized execution time for Intel SGX and SSM for VGG linear layers

performance overhead of our proposed SSM scheme does not grow as quickly as that of Intel SGX.

# VIII. RELATED WORK

There is a substantial body of research focused on memory encryption and integrity verification for general-purpose CPUs. This includes counter-mode encryption [19], [54] and various optimizations to reduce the size of VNs, as well as the development of counter-based integrity trees [15], [20], metadata caching [31], and strategies for predicting or speculatively using unverified VNs [32], [51]. These general-purpose protection schemes typically necessitate off-chip storage of VNs, which can be challenging for applications with large datasets and random access patterns. MGX [24] and SoftVN [56] represent significant advancements in this area. MGX introduces an approach that customize memory protection for specific applications and eliminating the need for off-chip VNs. SoftVN, on the other hand, utilizes software-generated VNs.

Additionally, ORAM is aimed at hiding access patterns. Since its first proposal by Goldreich [16] in 1987, ORAM has been evolving. Tree-based ORAM constructions such as Path ORAM [53] and Ring ORAM [46] are notable developments in this domain. Many studies have recently proposed optimized ORAM designs. For instance, recent references [5], [8], [44], [45] focus on optimizing the data locality of ORAMs, which could reduce redundant access and improve performance; [7], [10], [48] enable multiple users to concurrently access ORAM. However, despite these advancements, both approaches still struggle with significant bandwidth overhead, leading to an ongoing stream of research aimed at optimizing ORAM constructions for better efficiency and security.

## IX. CONCLUSION

We present SSM as an efficient memory security solution, effectively safeguarding data content, concealing access patterns, and facilitating integrity checks. Departing from traditional encryption-focused approaches, SSM uniquely protects data content while avoiding the overheads typical of AES-CTR modes. It inherently disguises memory access patterns, thus enhancing data confidentiality, and integrates mechanisms for data integrity. Experimental results indicate that SSM imposes only a 10% average overhead compared to baseline unprotected memory systems, and it surpasses AES-CTR mode and ORAM in performance, showing an average speedup of 15%.

#### REFERENCES

- AMD, "Amd memory encryption white paper," 2023. [Online]. Available: https://www.amd.com/system/files/TechDocs/memory-encryptionwhite-paper.pdf
- [2] A. Beimel, "Secret-sharing schemes: A survey," in *International conference on coding and cryptology*. Springer, 2011, pp. 11–46.
- [3] J.-P. Berrut and L. N. Trefethen, "Barycentric lagrange interpolation," SIAM review, vol. 46, no. 3, pp. 501–517, 2004.
- [4] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti *et al.*, "The gem5 simulator," *ACM SIGARCH computer architecture news*, vol. 39, no. 2, pp. 1–7, 2011.
- [5] D. Cao, M. Zhang, H. Lu, X. Ye, D. Fan, Y. Che, and R. Wang, "Streamline ring oram accesses through spatial and temporal optimization," in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021, pp. 14–25.
- [6] Y. Cao, Predictive technology model for robust nanoelectronic design. Springer Science & Business Media, 2011.
- [7] A. Chakraborti and R. Sion, "Concuroram: High-throughput stateless parallel multi-client oram," arXiv preprint arXiv:1811.04366, 2018.
- [8] Y. Che and R. Wang, "Multi-range supported oblivious ram for efficient block data retrieval," in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020, pp. 369– 382.
- [9] Y. Che and R. Wang, "Dnncloak: Secure dnn models against memory side-channel based reverse engineering attacks," in 2022 IEEE 40th International Conference on Computer Design (ICCD). IEEE, 2022, pp. 89–96.
- [10] W. Cheng, D. Sang, L. Zeng, Y. Wang, and A. Brinkmann, "Tianji: Securing a practical asynchronous multi-user oram," *IEEE Transactions* on Dependable and Secure Computing, 2023.
- [11] N. Crooks, M. Burke, E. Cecchetti, S. Harel, R. Agarwal, and L. Alvisi, "Obladi: Oblivious serializable transactions in the cloud," in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018, pp. 727–743.
- [12] E. Dawson and D. Donovan, "The breadth of shamir's secret-sharing scheme," *Computers & Security*, vol. 13, no. 1, pp. 69–78, 1994.
- [13] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
- [14] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
- [15] R. Elbaz, D. Champagne, R. B. Lee, L. Torres, G. Sassatelli, and P. Guillemin, "Tec-tree: A low-cost, parallelizable tree for efficient defense against memory replay attacks," in *Cryptographic Hardware and Embedded Systems-CHES 2007: 9th International Workshop, Vienna, Austria, September 10-13, 2007. Proceedings 9.* Springer, 2007, pp. 289–302.
- [16] O. Goldreich, "Towards a theory of software protection and simulation by oblivious rams," in *Proceedings of the nineteenth annual ACM* symposium on Theory of computing, 1987, pp. 182–194.
- [17] O. Goldreich and R. Ostrovsky, "Software protection and simulation on oblivious rams," *Journal of the ACM (JACM)*, vol. 43, no. 3, pp. 431–473, 1996.
- [18] S. Gueron, "Memory encryption for general-purpose processors," *IEEE Security & Privacy*, vol. 14, no. 6, pp. 54–62, 2016.
- [19] S. Gueron, "Memory encryption for general-purpose processors," *IEEE Security & Privacy*, vol. 14, no. 6, pp. 54–62, 2016.
- [20] W. E. Hall and C. S. Jutla, "Parallelizable authentication trees," in Selected Areas in Cryptography: 12th International Workshop, SAC 2005, Kingston, ON, Canada, August 11-12, 2005, Revised Selected Papers 12. Springer, 2006, pp. 95–109.
- [21] J. Han, Y. Liu, L. Xiao, R. Xiao, and L. M. Ni, "A mutual anonymous peer-to-peer protocol design," in 19th IEEE International Parallel and Distributed Processing Symposium. IEEE, 2005, pp. 10–pp.
- [22] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in *Proceedings of the IEEE conference on computer vision* and pattern recognition, 2016, pp. 770–778.

- [23] M. Henson and S. Taylor, "Memory encryption: A survey of existing techniques," ACM Computing Surveys (CSUR), vol. 46, no. 4, pp. 1–26, 2014.
- [24] W. Hua, M. Umar, Z. Zhang, and G. E. Suh, "Mgx: Near-zero overhead memory protection for data-intensive accelerators," in *Proceedings of the* 49th Annual International Symposium on Computer Architecture, 2022, pp. 726–741.
- [25] W. Hua, Z. Zhang, and G. E. Suh, "Reverse engineering convolutional neural networks through side-channel information leaks," in *Proceedings* of the 55th Annual Design Automation Conference, 2018, pp. 1–6.
- [26] Intel, "Intel total memory encryption white paper," 2023. [Online]. Available: https://www.intel.com/content/www/us/en/architecture-andtechnology/total-memory-encryption-security-paper.html
- [27] M. S. Islam, M. Kuzu, and M. Kantarcioglu, "Access pattern disclosure on searchable encryption: ramification, attack and mitigation." in *Ndss*, vol. 20. Citeseer, 2012, p. 12.
- [28] Y. Ji, S. Lee, E. Downing, W. Wang, M. Fazzini, T. Kim, A. Orso, and W. Lee, "Rain: Refinable attack investigation with on-demand interprocess information flow tracking," in *Proceedings of the 2017 ACM SIGSAC conference on computer and communications security*, 2017, pp. 377–390.
- [29] T. M. John, "Privacy leakage via write-access patterns to the main memory," 2017.
- [30] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012.
- [31] J. Lee, T. Kim, and J. Huh, "Reducing the memory bandwidth overheads of hardware security support for multi-core processors," *IEEE Transactions on Computers*, vol. 65, no. 11, pp. 3384–3397, 2016.
- [32] T. S. Lehman, A. D. Hilton, and B. C. Lee, "Poisonivy: Safe speculation for secure memory," in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016, pp. 1–13.
- [33] M. Li, Y. Zhang, H. Wang, K. Li, and Y. Cheng, "{CIPHERLEAKS}: Breaking constant-time cryptography on {AMD}{SEV} via the ciphertext side channel," in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 717–732.
- [34] W. Liang, K. Bu, K. Li, J. Li, and A. Tavakoli, "Memcloak: Practical access obfuscation for untrusted memory," in *Proceedings of the 34th Annual Computer Security Applications Conference*, 2018, pp. 187–197.
- [35] C. Liu, L. Zhu, M. Wang, and Y.-a. Tan, "Search pattern leakage in searchable encryption: Attacks and new construction," *Information Sciences*, vol. 265, pp. 176–188, 2014.
- [36] S. Liu, A. Kolli, J. Ren, and S. Khan, "Crash consistency in encrypted non-volatile main memory systems," in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2018, pp. 310–323.
- [37] Y. Mao, Y. Zhu, Y. Wang, and Y. Guo, "Continuous variable quantum secret sharing with security enhancement in practical quantum communications," *Mathematics*, vol. 10, no. 20, p. 3768, 2022.
- [38] L. Martin, "Xts: A mode of aes for encrypting hard disks," *IEEE Security & Privacy*, vol. 8, no. 3, pp. 68–69, 2010.
- [39] A. J. Mashtizadeh, A. Bittau, D. Boneh, and D. Mazières, "Ccfi: Cryptographically enforced control flow integrity," in *Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security*, 2015, pp. 941–951.
- [40] B. Mi, B. Wu, D. Huang, Y. Liu, L. Chen, S. Wan et al., "Privacyoriented transaction for public blockchain via secret sharing," Security and Communication Networks, vol. 2022, 2022.
- [41] A. Narayanan and V. Shmatikov, "Robust de-anonymization of large sparse datasets," in 2008 IEEE Symposium on Security and Privacy (sp 2008). IEEE, 2008, pp. 111–125.
- [42] A. Narayanan and V. Shmatikov, "Myths and fallacies of" personally identifiable information"," *Communications of the ACM*, vol. 53, no. 6, pp. 24–26, 2010.
- [43] M. Naumov, D. Mudigere, H.-J. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang, U. Gupta, C.-J. Wu, A. G. Azzolini *et al.*, "Deep learning recommendation model for personalization and recommendation systems," *arXiv preprint arXiv:1906.00091*, 2019.
- [44] R. Rajat, Y. Wang, and M. Annavaram, "Pageoram: An efficient dram page aware oram strategy," in 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2022, pp. 91–107.
- [45] R. Rajat, Y. Wang, and M. Annavaram, "Laoram: A look ahead oram architecture for training large embedding tables," in *Proceedings of the*

50th Annual International Symposium on Computer Architecture, 2023, pp. 1–15.

- [46] L. Ren, C. W. Fletcher, A. Kwon, E. Stefanov, E. Shi, M. van Dijk, and S. Devadas, "Ring oram: Closing the gap between small and large client storage oblivious ram." *IACR Cryptol. ePrint Arch.*, vol. 2014, p. 997, 2014.
- [47] L. Ren, X. Yu, C. W. Fletcher, M. Van Dijk, and S. Devadas, "Design space exploration and optimization of path oblivious ram in secure processors," in *Proceedings of the 40th Annual International Symposium* on Computer Architecture, 2013, pp. 571–582.
- [48] C. Sahin, V. Zakhary, A. El Abbadi, H. Lin, and S. Tessaro, "Taostore: Overcoming asynchronicity in oblivious data storage," in 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016, pp. 198–217.
- [49] A. Samajdar, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna, "Scale-sim: Systolic cnn accelerator simulator," *arXiv preprint* arXiv:1811.02883, 2018.
- [50] D. Sanchez and C. Kozyrakis, "Zsim: Fast and accurate microarchitectural simulation of thousand-core systems," ACM SIGARCH Computer architecture news, vol. 41, no. 3, pp. 475–486, 2013.
- [51] W. Shi, H.-h. S. Lee, M. Ghosh, C. Lu, and A. Boldyreva, "High efficiency counter mode security architecture via prediction and precomputation," in *32nd International Symposium on Computer Architecture* (*ISCA'05*). IEEE, 2005, pp. 14–24.
- [52] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
- [53] E. Stefanov, M. v. Dijk, E. Shi, T.-H. H. Chan, C. Fletcher, L. Ren, X. Yu, and S. Devadas, "Path oram: an extremely simple oblivious ram protocol," *Journal of the ACM (JACM)*, vol. 65, no. 4, pp. 1–26, 2018.
- [54] G. E. Suh, D. Clarke, B. Gasend, M. Van Dijk, and S. Devadas, "Efficient memory integrity verification and encryption for secure processors," in *Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36.* IEEE, 2003, pp. 339–350.
- [55] M. Szydlo, "Merkle tree traversal in log space and time," in Advances in Cryptology-EUROCRYPT 2004: International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, May 2-6, 2004. Proceedings 23. Springer, 2004, pp. 541–554.
- [56] M. Umar, W. Hua, Z. Zhang, and G. E. Suh, "Softvn: Efficient memory protection via software-provided version numbers," in *Proceedings of the 49th Annual International Symposium on Computer Architecture*, 2022, pp. 160–172.
- [57] F. Wang, C. Yun, S. Goldwasser, V. Vaikuntanathan, and M. Zaharia, "Splinter: Practical private queries on public data," in 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), 2017, pp. 299–313.
- [58] J. Wichelmann, A. Pätschke, L. Wilke, and T. Eisenbarth, "Cipherfix: Mitigating ciphertext {Side-Channel} attacks in software," in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 6789– 6806.
- [59] J. Xu, X. Li, Y. Han, Y. Zhou, Z. Liu, Z. Zhang, and Y. Song, "Quantitative security analysis of three-level unitary operations in quantum secret sharing without entanglement," *Frontiers in Physics*, vol. 11, p. 1213153, 2023.
- [60] L. Yu, Y. Lu, X. Yan, Y. Jiang, J. Wang *et al.*, "Generative text secret sharing with topic-controlled shadows," *Security and Communication Networks*, vol. 2022, 2022.
- [61] Y. Zhao, X. Hu, S. Li, J. Ye, L. Deng, Y. Ji, J. Xu, D. Wu, and Y. Xie, "Memory trojan attack on neural network accelerators," in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2019, pp. 1415–1420.
- [62] W. Zheng, Y. Wu, X. Wu, C. Feng, Y. Sui, X. Luo, and Y. Zhou, "A survey of intel sgx and its applications," *Frontiers of Computer Science*, vol. 15, pp. 1–15, 2021.
- [63] X. Zhuang, T. Zhang, and S. Pande, "Hide: an infrastructure for efficiently protecting information leakage on the address bus," ACM SIGOPS Operating Systems Review, vol. 38, no. 5, pp. 72–84, 2004.