
Contributions
What do we propose?
- A modular and extensible architecture for data analytics systems that supports combining cryptographic techniques and trusted hardware technologies to protect sensitive data.
- A prototype extending the Apache Spark framework with new operations to be performed using different secure processing primitives.
- An experimental evaluation using an industry-standard benchmark for analytical processing scenarios: TPC-DS Benchmark.
SafeSpark Architecture
We provide a modular and extensible architecure based on Spark SQL

SafeMapper
Infers which primitives the user wants to use to protect his data, based on a configuration file defined by himself.
SafeSpark Worker
Abstracts the integration of secure processing primitives into Spark SQL.
Handler
Responsible for invoking CryptoBoxes to encode/decode data, and also to perform operations.
CryptoBoxes
Modular entitites representing a secure processing primitive.
Experimental Evaluation
We used the TPC-DS benchmark to evaluate our platform. The prototype was evaluated under three distinct secure setups where we combine distinct cryptographic and secure hardware primitives.