PROJECT EVOLUTION

This document outlines the evolution of the Coffee Sales Data Pipeline project. It highlights key architectural, technological, and business logic changes across major versions. This helps track improvements over time and maintain clear context for future development.

v1.0 - 2025.04 (Initial Release)

🎯 Overview
A basic data pipeline to simulate and analyze coffee shop sales.

📦 Architecture

Data Source

MongoDB stores transactional data (coffee orders).

Real-time Processing

Kafka Connect captures changes from MongoDB.
ElasticSearch stores real-time events.
Kibana visualizes operational metrics and trends.

Batch Processing

Airbyte extracts data from MongoDB into PostgreSQL (raw layer).
PostgreSQL acts as the data warehouse for storing structured data.
DBT transforms raw data into analytics-ready models and ensures data quality through testing.

Load Strategy

Full load only; no incremental load implemented yet.

✅ Benefits

Easy setup.
Low code.
Schema flexibility, easy to scale with MongoDB.
Airbyte supports schema changes and multi-source ingestion.

⚠️ Limitations

Only supports full-load batch processing; no incremental logic.
Does not implement Slowly Changing Dimension (SCD) handling.
Lacks real-time business logic (e.g., alerting, rule-based processing).
Orchestration tools are not integrated.
Logging and monitoring are not yet implemented.

v1.1 - 2025.06

🎯 Overview
Upgraded to support real-time business rules, historical data tracking, and scalable lakehouse design.