Anton Slizh

Middle

Registration: 04.08.2025

Specialization: Data Engineer

Python SQL Spark Hadoop AWS PostgreSQL Dwh Oracle Scala Java Terraform Kafka Git GitLab Big data Docker Database Developer Tableau ClickHouse

Skills

Python

SQL

Apache Spark

Airflow

Azure

PySpark

Apache Kafka

ClickHouse

AWS

Hadoop

DWH

Terraform

Tableau

Docker

Oracle

Git

Work experience

Data Engineer

03.2023 - 01.2025 |Innowise Group

SQL, Python, Java, Apache Airflow, PySpark, Delta Lake, PostgreSQL, AWS(EMR, Athena, S3, Glue, MWAA, Lambda, RDS, Lake Formation, ECS, ECR, EKS, CloudWatch, ADX, VPC, IAM), Terraform, TeamCity, GitLab.

◦ Designed, developed and maintained a corporate Data Lakehouse using Delta Lake in the AWS cloud; ◦ Building complex ETL pipelines using PySpark jobs running on the AWS EMR clusters; ◦ Comprehensive data analytics in AWS Athena and Pyspark notebooks; ◦ Deployed and managed the EMR Serverless applications for data processing tasks; ◦ Designed, developed, and optimized Airflow DAGs to orchestrate and manage complex ETL pipelines and data workflows; ◦ Collected and analyzed statistical data into PostgreSQL database; ◦ Generated complex data extractions and insightful analytical reports for external clients using AWS Athena and PySpark; ◦ Led the modernization of data storage by migrating extensive datasets from legacy internal formats to Delta Lake; ◦ Automated AWS infrastructure provisioning through the implementation of robust Terraform scripts; ◦ Configured CloudWatch for service logging, metrics collection, and the creation of monitoring dashboards; ◦ Managed users roles and configuring complex access rules for the data platform with AWS IAM and LakeFormation; ◦ Prepared and presented solution designs, PoCs, and technical demos to colleagues and stakeholders;

DWH Developer

07.2022 - 01.2023 |DataMola

SQL, Python, Oracle, IBM DB2, MS SSIS, IBM Cloudant, MongoDB, NodeJS, Bitbucket

◦ Designed and maintained centralized DWH using IBM DB2 to conduct proficient analysis of various Telecom data; ◦ Built and managed complex ETL pipelines with PL/SQL procedures to ingest row data files from source systems into DWH; ◦ Configured and scheduled the ETL pipelines into MS SSIS; ◦ Wrote and optimized complex SQL queries and PL/SQL procedures to retrieve, manipulate, and analyze data stored in IBM DB2 database, following to best practices for performance and efficiency; ◦ Onboarded new data sources into the system and designed optimized data models to ensure high performance, scalability, and compliance with data governance standards; ◦ Migrated data between environments, ensuring minimal downtime and data integrity; ◦ Prepared ad-hoc data extractions and advanced reports for customers and stakeholders; ◦ Implemented full-text search and indexing optimizations in IBM Cloudant, significantly improving query performance and search relevance; ◦ Prepared datasets and data marts for BI consumption, enabling the creation of insightful dashboards and reports in Tableau; ◦ Collaborated closely with frontend and QA teams to integrate data services with internal REST APIs and web UIs;

Data Engineer

11.2021 - 06.2022 |EPAM Systems

SQL, Python, Java, Databricks, PySpark, PostgreSQL, Apache Kafka, ElasticSearch, Apache Hadoop, Azure (Virtual Machines, Data Factory, Functions, Blob Storage, Active Directory, ACS, etc.), Jenkins, Docker, Kubernetes, GitHub.

◦ Consolidated diverse data sources into a unified Data Lake into Azure Blob Storage; ◦ Established and managed data pipelines in Azure Data Factory for efficient data movement and transformation; ◦ Analyzed large historical datasets using PySpark on the Databricks platform to extract and visualize valuable insights; ◦ Deployed Terraform scripts to set up cloud infrastructure in Azure; ◦ Configured Kafka monitoring with Grafana to visualize key performance metrics; ◦ Monitoring and tuning Kafka and associated components for optimal performance; ◦ Built fully automated CI/CD pipelines using Jenkins to streamline deployment processes; ◦ Fixed and optimized legacy Java code within data processing pipelines;

Data Engineer

since 02.2025 - Till the present day |Innowise Group

SQL, Python, Java, Apache Airflow, PySpark, ClickHouse, Delta Lake, PostgreSQL, Apache Kafka, AWS(EMR, Athena, S3, Glue, MWAA, Lambda, RDS, MSK, SNS, SQS, VPC, IAM), Terraform, TeamCity, GitLab.

◦ Designed and maintained a highly available, scalable, and consistent data solution specifically for a multitenant environment; ◦ Migrated file-based ELT pipelines from an S3 and PySpark architecture to a high-throughput solution powered by Apache Kafka and ClickHouse; ◦ Collaborated directly with business stakeholders to define key business reporting metrics for strategic decision making; ◦ Prepared complex analytical queries and business report generation in ClickHouse; ◦ Developed and fine-tuned Python-based Kafka Producers within AWS Lambda functions; ◦ Designed and implemented fault-tolerant data ingestion pipelines using AWS SNS, SQS, and Lambda to process S3 events; ◦ Developed and maintained Grafana dashboards to monitor key client activity metrics;

Educational background

Computer Science (Bachelor’s Degree)

2019 - 2023

Belarusian State University of Informatics and Radioelectronics

Languages

EnglishAdvancedBelarusianNativeRussianNative