← Back to list

Skills

Python
SQL
Apache Spark
Airflow
Azure
PySpark
Apache Kafka
ClickHouse
AWS
Hadoop
DWH
Terraform
Tableau
Docker
Oracle
Git

Work experience

Data Engineer
03.2023 - 01.2025 |Innowise Group
SQL, Python, Java, Apache Airflow, PySpark, Delta Lake, PostgreSQL, AWS(EMR, Athena, S3, Glue, MWAA, Lambda, RDS, Lake Formation, ECS, ECR, EKS, CloudWatch, ADX, VPC, IAM), Terraform, TeamCity, GitLab.
◦ Designed, developed and maintained a corporate Data Lakehouse using Delta Lake in the AWS cloud; ◦ Building complex ETL pipelines using PySpark jobs running on the AWS EMR clusters; ◦ Comprehensive data analytics in AWS Athena and Pyspark notebooks; ◦ Deployed and managed the EMR Serverless applications for data processing tasks; ◦ Designed, developed, and optimized Airflow DAGs to orchestrate and manage complex ETL pipelines and data workflows; ◦ Collected and analyzed statistical data into PostgreSQL database; ◦ Generated complex data extractions and insightful analytical reports for external clients using AWS Athena and PySpark; ◦ Led the modernization of data storage by migrating extensive datasets from legacy internal formats to Delta Lake; ◦ Automated AWS infrastructure provisioning through the implementation of robust Terraform scripts; ◦ Configured CloudWatch for service logging, metrics collection, and the creation of monitoring dashboards; ◦ Managed users roles and configuring complex access rules for the data platform with AWS IAM and LakeFormation; ◦ Prepared and presented solution designs, PoCs, and technical demos to colleagues and stakeholders;
DWH Developer
07.2022 - 01.2023 |DataMola
SQL, Python, Oracle, IBM DB2, MS SSIS, IBM Cloudant, MongoDB, NodeJS, Bitbucket
◦ Designed and maintained centralized DWH using IBM DB2 to conduct proficient analysis of various Telecom data; ◦ Built and managed complex ETL pipelines with PL/SQL procedures to ingest row data files from source systems into DWH; ◦ Configured and scheduled the ETL pipelines into MS SSIS; ◦ Wrote and optimized complex SQL queries and PL/SQL procedures to retrieve, manipulate, and analyze data stored in IBM DB2 database, following to best practices for performance and efficiency; ◦ Onboarded new data sources into the system and designed optimized data models to ensure high performance, scalability, and compliance with data governance standards; ◦ Migrated data between environments, ensuring minimal downtime and data integrity; ◦ Prepared ad-hoc data extractions and advanced reports for customers and stakeholders; ◦ Implemented full-text search and indexing optimizations in IBM Cloudant, significantly improving query performance and search relevance; ◦ Prepared datasets and data marts for BI consumption, enabling the creation of insightful dashboards and reports in Tableau; ◦ Collaborated closely with frontend and QA teams to integrate data services with internal REST APIs and web UIs;
Data Engineer
11.2021 - 06.2022 |EPAM Systems
SQL, Python, Java, Databricks, PySpark, PostgreSQL, Apache Kafka, ElasticSearch, Apache Hadoop, Azure (Virtual Machines, Data Factory, Functions, Blob Storage, Active Directory, ACS, etc.), Jenkins, Docker, Kubernetes, GitHub.
◦ Consolidated diverse data sources into a unified Data Lake into Azure Blob Storage; ◦ Established and managed data pipelines in Azure Data Factory for efficient data movement and transformation; ◦ Analyzed large historical datasets using PySpark on the Databricks platform to extract and visualize valuable insights; ◦ Deployed Terraform scripts to set up cloud infrastructure in Azure; ◦ Configured Kafka monitoring with Grafana to visualize key performance metrics; ◦ Monitoring and tuning Kafka and associated components for optimal performance; ◦ Built fully automated CI/CD pipelines using Jenkins to streamline deployment processes; ◦ Fixed and optimized legacy Java code within data processing pipelines;
Data Engineer
since 02.2025 - Till the present day |Innowise Group
SQL, Python, Java, Apache Airflow, PySpark, ClickHouse, Delta Lake, PostgreSQL, Apache Kafka, AWS(EMR, Athena, S3, Glue, MWAA, Lambda, RDS, MSK, SNS, SQS, VPC, IAM), Terraform, TeamCity, GitLab.
◦ Designed and maintained a highly available, scalable, and consistent data solution specifically for a multitenant environment; ◦ Migrated file-based ELT pipelines from an S3 and PySpark architecture to a high-throughput solution powered by Apache Kafka and ClickHouse; ◦ Collaborated directly with business stakeholders to define key business reporting metrics for strategic decision making; ◦ Prepared complex analytical queries and business report generation in ClickHouse; ◦ Developed and fine-tuned Python-based Kafka Producers within AWS Lambda functions; ◦ Designed and implemented fault-tolerant data ingestion pipelines using AWS SNS, SQS, and Lambda to process S3 events; ◦ Developed and maintained Grafana dashboards to monitor key client activity metrics;

Educational background

Computer Science (Bachelor’s Degree)
2019 - 2023
Belarusian State University of Informatics and Radioelectronics

Languages

EnglishAdvancedBelarusianNativeRussianNative