Data Integrity Sentinel
Key tech stack:
- Java 21 (
Stream,Concurrent,Security) - Spring Boot
- Postgres, DynamoDB
- Docker, Localstack
This project is a high-performance data integrity system. Key requirements of this projects are:
- High throughput
- Memory efficiency
- Ability to work with large volumes of files and large file sizes
- Robust
To improve memory efficiency, I designed the system around streaming. Instead of loading entire files, it processes data in small, fixed-size chunks. The result is that a 10GB file consumes the same tiny amount of memory as a 10MB file, keeping usage constant and predictable.
For throughput, I have leveraged producer-consumer architecture. The system is built from several concurrent stages. Further, I have also applied multi-threading. This architecture creates a powerful combination that maximizes throughput
I have used DynamoDB and Postgres databases. DynamoDB is used for storing state, while Postgres stores historical logs. DynamoDB provides low-latency key-value operations crucial for high-throughput processing. Postgres supports complex analytical queries on audit logs.
I have leveraged Spring Boot to provide configuration and dependency injection.
Ultimately, this design achieves near-constant memory usage regardless of file size and a sustained throughput of over 500 MB/sec.