Key Responsibilities:
- Design, develop, and maintain data pipelines for handling large volumes of data streams using Apache Kafka and Apache Flink.
- Implement real-time data processing solutions using Apache Flink or Apache Spark.
- Build and maintain RESTful APIs using Spring Boot to support data integration across systems.
- Design and implement workflow orchestration solutions using Temporal IO to ensure fault-tolerant, scalable, and reliable distributed systems.
- Optimize Flink jobs for performance, reliability, and scalability.
- Develop resilient, event-driven architectures integrating Flink and Temporal.
- Develop and manage Temporal workflows for orchestrating complex data pipelines and processes.
- Work with NoSQL databases, ensuring optimal performance and scalability.
- Collaborate with cross-functional teams to deliver high-quality data solutions for business needs.
- Monitor and improve system performance, latency, and throughput.
- Ensure best practices for fault tolerance, high availability, and observability.
- Troubleshoot and optimize existing data pipelines and workflows to ensure efficiency and reliability.
Required Skills:
- 5+ years of experience in data engineering or a related field.
- Strong proficiency in Apache Kafka for handling large-scale data streams.
- Strong experience in Apache Flink, including Flink SQL, DataStream API, and State Management.
- Expertise in Apache Flink or Apache Spark for real-time and batch data processing.
- Proven experience with Temporal or similar workflow orchestration frameworks (e.g., Cadence).
- Proficiency in Java or Scala (Python experience is a plus).
- Experience with Kafka or other messaging/streaming platforms.
- Strong understanding of distributed systems and event-driven architecture.
- Knowledge of database technologies (SQL/NoSQL) and their integration with Flink.
- Familiarity with monitoring tools like Prometheus, Grafana, or OpenTelemetr
- Experience building RESTful APIs with Spring Boot.
- Solid understanding of NoSQL databases (e.g., MongoDB, Cassandra).
- Familiarity with cloud environments and containerization (e.g., AWS, Docker) is a plus.
Preferred Qualifications:
- Experience with Data Lake and ETL pipelines.
- Knowledge of CI/CD pipelines and infrastructure as code (Terraform, Helm).
- Understanding of fault tolerance strategies in stream processing.
- Contributions to open-source projects related to Flink or Temporal.