SafeSpark

A Secure Data Analytics Platform using Cryptographic Schemes and Trusted Hardware

Contributions

Contributions

What do we propose?

  • A modular and extensible architecture for data analytics systems that supports combining cryptographic techniques and trusted hardware technologies to protect sensitive data.
  • A prototype extending the Apache Spark framework with new operations to be performed using different secure processing primitives.
  • An experimental evaluation using an industry-standard benchmark for analytical processing scenarios: TPC-DS Benchmark.

SafeSpark Architecture

We provide a modular and extensible architecure based on Spark SQL

SafeMapper

Infers which primitives the user wants to use to protect his data, based on a configuration file defined by himself.

SafeSpark Worker

Abstracts the integration of secure processing primitives into Spark SQL.

Handler

Responsible for invoking CryptoBoxes to encode/decode data, and also to perform operations.

CryptoBoxes

Modular entitites representing a secure processing primitive.

Experimental Evaluation

We used the TPC-DS benchmark to evaluate our platform. The prototype was evaluated under three distinct secure setups where we combine distinct cryptographic and secure hardware primitives.