PROJECT EVOLUTION
This document outlines the evolution of the Coffee Sales Data Pipeline project. It highlights key architectural, technological, and business logic changes across major versions. This helps track improvements over time and maintain clear context for future development.
v1.0 - 2025.04 (Initial Release)
🎯 Overview
A basic data pipeline to simulate and analyze coffee shop sales.
📦 Architecture
Data Source
MongoDBstores transactional data (coffee orders).
Real-time Processing
Kafka Connectcaptures changes from MongoDB.ElasticSearchstores real-time events.Kibanavisualizes operational metrics and trends.
Batch Processing
Airbyteextracts data from MongoDB into PostgreSQL (raw layer).PostgreSQLacts as the data warehouse for storing structured data.DBTtransforms raw data into analytics-ready models and ensures data quality through testing.
Load Strategy
- Full load only; no incremental load implemented yet.
✅ Benefits
- Easy setup.
- Low code.
- Schema flexibility, easy to scale with MongoDB.
- Airbyte supports schema changes and multi-source ingestion.
⚠️ Limitations
- Only supports full-load batch processing; no incremental logic.
- Does not implement Slowly Changing Dimension (SCD) handling.
- Lacks real-time business logic (e.g., alerting, rule-based processing).
- Orchestration tools are not integrated.
- Logging and monitoring are not yet implemented.
v1.1 - 2025.06
🎯 Overview
Upgraded to support real-time business rules, historical data tracking, and scalable lakehouse design.
📦 Architecture
Data Source
MySQLstores both transactional and attribute data.
Real-time Processing
- Continues to use
Kafka Connectfor Change Data Capture (CDC). - Adds
Kafka Consumersto handle real-time business logic. - Uses
Redisfor low-latency lookups and caching. - Uses
PrometheusandGrafanato observe Kafka health and trigger alerts.
Batch Processing
- Adopts a Lakehouse and Medallion Architecture.
Sparkhandles data ingestion, transformation, and data quality checks.Airflowis used for orchestration and job scheduling.
Load Strategy
- Supports incremental load.
✅ Benefits
- Incremental load implemented.
- Slowly Changing Dimension (Type 2) supported.
- Real-time business rules applied during stream processing.
- Monitoring, alerting and logging integrated.
- Full orchestration with Airflow.
⚠️ Limitations
- Not yet serving layer (e.g., SQL engine or BI tool) for DA/BA usage.
Last updated: 16, June 2025

