Data Engineer
03.2023 - 01.2025 |Innowise Group
SQL, Python, Java, Apache Airflow, PySpark, Delta Lake, PostgreSQL, AWS(EMR, Athena, S3, Glue, MWAA, Lambda, RDS, Lake Formation, ECS, ECR, EKS, CloudWatch, ADX, VPC, IAM), Terraform, TeamCity, GitLab.
◦ Designed, developed and maintained a corporate Data Lakehouse using Delta Lake in the AWS cloud;
◦ Building complex ETL pipelines using PySpark jobs running on the AWS EMR clusters;
◦ Comprehensive data analytics in AWS Athena and Pyspark notebooks;
◦ Deployed and managed the EMR Serverless applications for data processing tasks;
◦ Designed, developed, and optimized Airflow DAGs to orchestrate and manage complex ETL pipelines and
data workflows;
◦ Collected and analyzed statistical data into PostgreSQL database;
◦ Generated complex data extractions and insightful analytical reports for external clients using AWS Athena
and PySpark;
◦ Led the modernization of data storage by migrating extensive datasets from legacy internal formats to Delta
Lake;
◦ Automated AWS infrastructure provisioning through the implementation of robust Terraform scripts;
◦ Configured CloudWatch for service logging, metrics collection, and the creation of monitoring dashboards;
◦ Managed users roles and configuring complex access rules for the data platform with AWS IAM and LakeFormation;
◦ Prepared and presented solution designs, PoCs, and technical demos to colleagues and stakeholders;
DWH Developer
07.2022 - 01.2023 |DataMola
SQL, Python, Oracle, IBM DB2, MS SSIS, IBM Cloudant, MongoDB, NodeJS, Bitbucket
◦ Designed and maintained centralized DWH using IBM DB2 to conduct proficient analysis of various Telecom data;
◦ Built and managed complex ETL pipelines with PL/SQL procedures to ingest row data files from source
systems into DWH;
◦ Configured and scheduled the ETL pipelines into MS SSIS;
◦ Wrote and optimized complex SQL queries and PL/SQL procedures to retrieve, manipulate, and analyze
data stored in IBM DB2 database, following to best practices for performance and efficiency;
◦ Onboarded new data sources into the system and designed optimized data models to ensure high performance, scalability, and compliance with data governance standards;
◦ Migrated data between environments, ensuring minimal downtime and data integrity;
◦ Prepared ad-hoc data extractions and advanced reports for customers and stakeholders;
◦ Implemented full-text search and indexing optimizations in IBM Cloudant, significantly improving query
performance and search relevance;
◦ Prepared datasets and data marts for BI consumption, enabling the creation of insightful dashboards and
reports in Tableau;
◦ Collaborated closely with frontend and QA teams to integrate data services with internal REST APIs and
web UIs;
Data Engineer
11.2021 - 06.2022 |EPAM Systems
SQL, Python, Java, Databricks, PySpark, PostgreSQL, Apache Kafka, ElasticSearch, Apache Hadoop, Azure (Virtual Machines, Data Factory, Functions, Blob Storage, Active Directory, ACS, etc.), Jenkins, Docker, Kubernetes, GitHub.
◦ Consolidated diverse data sources into a unified Data Lake into Azure Blob Storage;
◦ Established and managed data pipelines in Azure Data Factory for efficient data movement and transformation;
◦ Analyzed large historical datasets using PySpark on the Databricks platform to extract and visualize valuable insights;
◦ Deployed Terraform scripts to set up cloud infrastructure in Azure;
◦ Configured Kafka monitoring with Grafana to visualize key performance metrics;
◦ Monitoring and tuning Kafka and associated components for optimal performance;
◦ Built fully automated CI/CD pipelines using Jenkins to streamline deployment processes;
◦ Fixed and optimized legacy Java code within data processing pipelines;
Data Engineer
since 02.2025 - Till the present day |Innowise Group
SQL, Python, Java, Apache Airflow, PySpark, ClickHouse, Delta Lake, PostgreSQL, Apache Kafka, AWS(EMR, Athena, S3, Glue, MWAA, Lambda, RDS, MSK, SNS, SQS, VPC, IAM), Terraform, TeamCity, GitLab.
◦ Designed and maintained a highly available, scalable, and consistent data solution specifically for a multitenant environment;
◦ Migrated file-based ELT pipelines from an S3 and PySpark architecture to a high-throughput solution powered by Apache Kafka and ClickHouse;
◦ Collaborated directly with business stakeholders to define key business reporting metrics for strategic decision making;
◦ Prepared complex analytical queries and business report generation in ClickHouse;
◦ Developed and fine-tuned Python-based Kafka Producers within AWS Lambda functions;
◦ Designed and implemented fault-tolerant data ingestion pipelines using AWS SNS, SQS, and Lambda to
process S3 events;
◦ Developed and maintained Grafana dashboards to monitor key client activity metrics;