Max Pershin
Portfolio
Implementation of DeepSpeech2 model
● Implementation of DeepSpeech2 speech recognition model using Pytorch/Pytorch-Lightning frameworks from “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin“. ● Model is trained on LibriSpeech and LJSpeech datasets, LM-fusion with 4 gram KenLM model used in beam search ctc-decoding.
Implementation of SN-PatchGAN model
● Implementation of SN-PatchGAN image inpainting model using Pytorch/Pytorch-Lightning frameworks from “Free-Form Image Inpainting with Gated Convolution“. ● Inpainting system is capable of completing images with free-form mask and guidance.
CTF competitions platform
● This platform was initially used for conducting information security classes in an IT summer camp for students. ● The platform is capable of handling multiple Capture The Flag contests in parallel. ● Implemented in Python language, using Django Rest Framework, Django Channels, Vue.js, MySQL, Redis, Nginx, Docker.
Implementation of FastSpeech model
● Implementation of FastSpeech text to speech model using Pytorch/Pytorch Lightning frameworks from “FastSpeech: Fast, Robust and Controllable Text to Speech“. ● Model speeds up mel-spectrogram generation by 270x and the end-to-end speech synthesis by 38x compared to autoregressive models.
Implementation of FastGAN model
● Implementation of FastGAN model using Pytorch/Pytorch-Lightning frameworks from “Towards Faster and Stabilized GAN Training for High Fidelity Few-shot Image Synthesis“. ● Notably, the model converges from scratch with just a few hours of training on a single RTX-2080 GPU, and has a consistent performance, even with less than 100 training samples.
Implementation of HiFi-GAN model
● Implementation of HiFi-GAN neural vocoder model using Pytorch/Pytorch-Lightning frameworks from “HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis“. ● HiFi-GAN generates samples 13.4 times faster than real-time on CPU and 167.9 times faster than real-time on a single V100 GPU with comparable quality to an autoregressive counterpart (WaveNet).