Skip to content

📌 CHANGELOG

Date Version Changes Notes
2024-08 v1.0 - This marks the first version of the pipeline, in which the core logic has been successfully implemented.
2025-04-19 v1.1 - Rewrote documentation.
- Implemented logging & alerting.
Version 1.0

✨ Features

  • Successfully implemented an end-to-end real-time processing pipeline.
  • Stored all data and analysis in the PostgreSQL database.
  • Detected abnormal trips using basic rules (e.g., unusual duration, unexpected fare).
  • Enables real-time and periodic (daily, weekly, etc.) trip trend analysis.

✅ Benefits

  • Easy setup with strong compatibility between Kafka, Spark, PostgreSQL, and Power BI.
  • Leveraged parallelism: Kafka partitions and Spark distributed processing.
  • Stable data pipeline with real-time ingestion and visualization.

⚠️ Limitations

  • Alerts are not triggered in real-time — detection exists but lacks immediate notification.
  • Not optimized for high-throughput or large-scale workloads (PostgreSQL limitations).
  • Simultaneous read and write operations on PostgreSQL can lead to table-level locks, affecting both ingestion speed and query performance.
  • Kafka only had one producer despite having 10 partitions → underutilized parallelism.
  • No performance benchmark or monitoring for system health yet.
Version 1.1
  • Although a single producer for 10 partitions may reduce write throughput, it is intentionally kept to simulate real-time data flow — aligning with the project's goal of time-based analyzing behavior.

✨ Features

  • Rewrote documentation and deployment notes to include new changes.
  • Real-time alerts are now implemented.

✅ Benefits

  • Enables immediate anomaly detection and faster response time.