Data Ingestion Platform

  • Real time data ingestion using Data Ingestion Platform (DiP)harness the powers of Apache Apex, Apache Flink, Apache Spark and Apache Storm to stream data into lambda architecture. Apache Kafka plays a key role as messaging bus from source to streaming component.
  • DiP comes along with a UI in case users wants to upload data from their desktops and also, any data can be ingested from any source like Cloud Storage or local file system. UI plays a key role in learning and choosing the streaming components in the initial phase of understanding the system.
  • DiP Technology Stack:
    • Source System– Web Client
    • Messaging System– Apache Kafka
    • Target System– HDFS, Apache HBase, Apache Hive
    • Reporting System– Apache Phoenix(CLI), Apache Zeppelin
    • Streaming API’s– Apache Apex, Apache Flink, Apache Spark and Apache Storm
    • Programming Language– Java
    • IDE – Eclipse
    • Build tool– Apache Maven
    • Operating System– CentOS 7