← Back to list
senior
Registration: 26.02.2024

Portfolio

Aligned Automation

2. DELL - Address Validation & Geocoding (Python, NLP / DL). This project basically deals with rectifying bad or incorrect GEO address from LATAM countries like MEXICO, ARGENTINA, CHILLI, etc. with the help of Google Map, OpenStreetMap API and NLP technique of sentence embedding using SBERT (Sentence BERT). ● Solution architecture/design and Implementation in Azure Blob Storage, Azure ML & Azure DevOps CI-CD framework. ● NLP (SimCSE: Simple Contrastive Learning of Sentence Embeddings) model were trained on Tesla v100 GPU based system for domain specific corpus (i.e., SPANISH public domain addresses) for Textual Semantic Search downstream task and used to stored pre-validated Mexico address embeddings and search them in Elasticsearch-8.0 based dense vector database using cosine similarity. ● Solutions was made RESTful through Flask API framework and hooked to NodeJS / Reacts UI for better user experience. ● Project Management, Handling team of data scientist/machine learning engineer, data engineer, test engineer.

NEC India Corporation

1. NEC Japan - GTC Framework Development (Python, PyTorch DL, NLP). This project basically deals with development of Generic Text Classifier (GTC) framework that offers (a) Training (i.e., just use architecture) and Additional training (i.e., use architecture and weights) of various pre-trained embedding models on custom dataset (b) Various validation method (i.e., hold-out, k-fold), (c) Build classification model on top of embedding model from (a) with custom network layer defined through configuration. ● Ask was to build GTC framework that can support various customization (i.e., customized data, custom embedding and classification model) using pyTorch framework. ● Lead the framework design and development (includes defining the framework architecture, AWS EC2 (GPU) infra selection, GTC framework User Manual Creation). ● Various Transformer based Embedding Model (viz. BERT, BART, LaBSE, GPT2) & pre-defined Classification Model (viz. AutoModelSequenceClassification) were trained and tested on custom dataset. ● Received SPOT award for development of framework in short span of time. 2. Generative AI based POC and Proposal (Python, Generative AI/LLM, LangChain, Azure OpenAI). ● Contributed to Project Proposal based on POC for Q&A Bot to address banking customer query using Azure OpenAI API service. ● Inhouse POC - Built Q&A Bot using Lang Chain framework to address HR policy related queries.

Aligned Automation

1. SAP SD – Information Extraction (Python, PyTorch, Generative AI / LLM). This project basically deals with information extraction using Generative AI approach (i.e., one-shot, few-shot, Tree of Thought, Static and Dynamic Prompt engineering, Alpaca style Prompt Engineering, Auto Prompt Tuning etc.) from SAP Business Requirement (supplied in form of Q&A dataset for SAP SD Module). ● Ask was to extract keywords (single or multiple) from SAP Business Requirement supplied inform of Question & Answer. ● Existing SAP implement AI tool used to map Q&A to specific SAP SD BDC screen’s Process Element and its field name. However, Field value is supposed to be Extracted from Question & Answer through Prompt Engineering Technique. ● Various LLM (viz. Falcon, Vicuna, Llama based i.e., openllama, Nous-Hermas-Llama) were used for Prompt Engineering Technique (viz. one-shot, few-shot, Tree of Thoughts engineering, auto-prompt tuning). ● Contributed in research and leading the team of NLP Prompt engineer.

Skills

Python
Pytorch
TensorFlow
Kearas
Nltk
SciKit-Learn
SciPy
Seaborn
Matplotlib
Pandas
NumPy
R (3.3.1)
R-Studio
Lime/Shap Library
Azure ML
C3.AI Platform
Azure Data Factory
Databricks
DevOps
Hadoop 2.0 Cloudera CHD-5.4
HDP-2.3
Hive
HBase
Zookeeper
PySpark
Apache Zeppelin
Flume
Hue
SAP HAN Vora
GPT-3.5/4
Openllama
Nous-HermasLlama2-13b
Alpaca
Vicuna
Lang Chain
Azure OpenAI API
RAG
PEFT (LoRA)
BERT/SBERT
LaBSE
BART
SimCSE
GPT-2/3
AI/ML Ops (Mlflow, Azure-MlOp)
Chatbot w / RASA NLU
Lasso Regression
Random Forest
XGboost
Naive Bayes
K-Mean
Isolation Forest
ANN
RNN
LSTM
Neo4j
ADLS-Gen2
SAP S / 4 HANA
Teradata
MongoDB
DB2v10
Oracle10g
R-Studio (0.99)
HANA-Studio (2.3.8)
PyCharm (2018.3.5)
Sublime Text (3.1.1)
Spyder
Jupiter Notebook
Eclipse (Mars 2.0)
JIRA
Issue Tracking
Task Management
Scrum Dashboard
Docker
GitHub
Jenkins
Maven Build Management
Sphinx
Roxygen
Do
Unix
Linux (Debian, Ubuntu, CentOS, RedHat, SuSe)
Windows

Work experience

AI/ML Technical Architect
06.2023 - 10.2023 |NEC India Corporation
Python, PyTorch DL, NLP, Generative AI/LLM, LangChain, Azure OpenAI
1. NEC Japan - GTC Framework Development (Python, PyTorch DL, NLP). This project basically deals with development of Generic Text Classifier (GTC) framework that offers: a). Training (i.e., just use architecture) and Additional training (i.e., use architecture and weights) of various pre-trained embedding models on custom dataset. b). Various validation method (i.e., hold-out, k-fold). c). Build classification model on top of embedding model from a with custom network layer defined through configuration. ● Ask was to build GTC framework that can support various customization (i.e., customized data, custom embedding and classification model) using pyTorch framework. ● Lead the framework design and development (includes defining the framework architecture, AWS EC2 (GPU) infra selection, GTC framework User Manual Creation). ● Various Transformer based Embedding Model (viz. BERT, BART, LaBSE, GPT2) & pre-defined Classification Model (viz. AutoModelSequenceClassification) were trained and tested on custom dataset. ● Received SPOT award for development of framework in short span of time. 2. Generative AI based POC and Proposal (Python, Generative AI/LLM, LangChain, Azure OpenAI). ● Contributed to Project Proposal based on POC for Q&A Bot to address banking customer query using Azure OpenAI API service. ● Inhouse POC - Built Q&A Bot using Lang Chain framework to address HR policy related queries.
Manager / Principal Data Scientist
10.2021 - 02.2023 |Aligned Automation
Python, PyTorch, Generative AI/LLM, Spark ML, EDA/Graph Visualization
1. SAP SD – Information Extraction (Python, PyTorch, Generative AI/LLM). This project basically deals with information extraction using Generative AI approach (i.e., one-shot, few-shot, Tree of Thought, Static and Dynamic Prompt engineering, Alpaca style Prompt Engineering, Auto Prompt Tuning etc.) from SAP Business Requirement (supplied in form of Q&A dataset for SAP SD Module). ● Ask was to extract keywords (single or multiple) from SAP Business Requirement supplied inform of Question & Answer. ● Existing SAP implement AI tool used to map Q&A to specific SAP SD BDC screen’s Process Element and its field name. However, Field value is supposed to be Extracted from Question & Answer through Prompt Engineering Technique. ● Various LLM (viz. Falcon, Vicuna, Llama based i.e., openllama, Nous-Hermas-Llama) were used for Prompt Engineering Technique (viz. one-shot, few-shot, Tree of Thoughts engineering, auto-prompt tuning). ● Contributed in research and leading the team of NLP Prompt engineer. 2. DELL - Address Validation & Geocoding (Python, NLP/DL). This project basically deals with rectifying bad or incorrect GEO address from LATAM countries like Mexico, Argentina, Chilli, etc. with the help of Google Map, OpenStreetMap API and NLP technique of sentence embedding using SBERT (Sentence BERT). ● Solution architecture/design and Implementation in Azure Blob Storage, Azure ML & Azure DevOps CI-CD framework. ● NLP (SimCSE: Simple Contrastive Learning of Sentence Embeddings) model were trained on Tesla v100 GPU based system for domain specific corpus (i.e., SPANISH public domain addresses) for Textual Semantic Search downstream task and used to stored pre-validated Mexico address embeddings and search them in Elasticsearch-8.0 based dense vector database using cosine similarity. ● Solutions was made RESTful through Flask API framework and hooked to NodeJS / Reacts UI for better user experience. ● Project Management, Handling team of data scientist/machine learning engineer, data engineer, test engineer. 3. DELL - Asset Overheating Classification (Python, Spark ML, EDA / Graph Visualization). This project basically deals with large scale telemetry data processing (Azure Databricks platform w/Spark ML Lib) and establishing the MOI parameters that contributes to Asset classification (Overheating v/s Non-Overheating) through EDA & ML. Provide insightful information to Tech Support Persona to avoid unrealistic / unjustified dispatch. ● Lead the project on establishing the threshold values for MOI parameters and their persistence duration through EDA. ● Calculated the weightage of parameter found in EDA through statistical approach in iterative way and cross checking them w/asset that dispatch for overheating reason. ● Highlight the parameter as contributing factor based on correlation matrix. ● Based on MOI parameter’s threshold, persistence & weightages value – classify asset into Overheating v/s Non Overheating through Imperial methods. ● Continue to build ML model on ASSET classification and measure the model performance. ● Data and benchmark threshold ingested in Neo4j graph for drill down analysis through Neo4j-Bloom. ● Project Management, Handling team of Data scientist, Graph Database Engineer.
Lead Machine Learning Engineer
05.2021 - 09.2021 |Savart
Python, Deep Learning, NLP
1. AI-Enabled Stock Advisory / Recommendation (Python, Deep Learning, NLP). This project basically deals with building an advisory app for security traded in different stock exchanges. ● Qualitative Analysis is being Implemented as NLP project on company’s public data like annual reports, call transcript pdf (text, image, table), news articles and implemented BERT based downstream Q&A task for final scoring model. ● Building scoring model on top of BERT Q&A downstream task. ● Built chatbot using RASA-NLU framework as interface to backend Q&A model.
Sr. Software Engineer ML/DS/Bigdata
10.2012 - 11.2020 |Xoriant Solutions Pvt. Ltd.
Python, Deep Learning, NLP, R, Machine Learning, Hadoop Big Data, PySpark, Java, Hive
1. SAP Ariba - Text Commodity Classifier (Python, Deep Learning, NLP). This project deals with development of Commodity-Text Classifier that classify newly created commodity to respective category. ● Implemented basic Multinomial Naïve Bayes model at initial stage with tf-idf vectorization and used chi-square test to choose most relevant word features for this multiclass text classification. ● At later stage, bi-directional LSTM model was built with fitting word-2-vec on training corpus using genism and created word embedding matrix which further used as weight in Keras Embedding layer. ● At final stage transfer learning model i.e., BERT was implemented for this multiclass text classification. 2. SAP Ariba - Risk Quantification and Predictions (R, Machine Learning). This project deals with implementing Data Science in R-3.3.1 and modeling techniques like linear regression, lasso regression and used statical methods like interpolation etc. for calculating and listing potential and categorical risk score/exposure with various contributing factors on Supplier 360 page of Supplier Risk Management Application that offers valuable risk insights for each supplier in procurement process. It considers various News feed (e.g., bankruptcy, lawsuit, disaster). ● I involved in building and implementing RISK EXPOSURE model in R-3.3.1 through different DS phases (like Understanding Business Problem, Data Collection/Cleaning EDA/Data Modeling, Validation, Visualization, Deployment & Optimization). ● Created Reporting API through R endpoint and making it RESTful (API) through open CPU Instance so that it can be consume by JAVA based application and can be tested by POSTMAN app. ● I involved in creating schemas/store procedure/triggers in HANA (SQL) and creating. 3. SAP HANA Vora Evaluation (Hadoop Big Data, PySpark). This project deals with evaluation of SAP HANA Vora big data product that involves Setting up SAP HANA Vora (dev) edition built on Hadoop Hotornworks distribution on AWS instance, Ingesting HVAC sensors data (streamed by apache flume) and facts table data (loaded via Sqoop) in Hadoop data lake and evaluated SAP HANA Vora capabilities of building data hierarchy on HDFS raw data, interactive and drill down OLAP style analytics through PySpark code executed and end results visualized in Apache Zeppelin webbased notebook. Further Vora virtual table in Hadoop can be use as data source in SAP BI tool like Lumira. ● Implemented I involved in e-learning and setting up AWS instance and installing SAP HANA Vora (dev) edition built on Hotornwork HDP-2.3 troubleshooting and monitoring SAP HANA Vora (dev) edition instance with Apache Ambari. ● Understanding the architecture and working of SAP HANA Vora on Hadoop and its integration with SAP HANA in-memory computing framework. ● I involved in ingesting sample data in Hadoop Lake and evaluating various features of SAP HANA Vora (viz. building data hierarchy & running interactive analytics in Apache Zeppelin) using sparkSQL. 4. Data Analysis Migration from QlikView to Hadoop (Java, Python, Big Data, Hive). This project deals with migrating sale related reports written in QlikView's SQL type language to Hive and Impala SQL language. Client has several product categories each with different service and contract models in Mobile domain. The client’s sales team has to rely on critical data analysis to dig into its massive customer base and millions of contracts signed with them to find lucrative revenue sources in form of potential service agreements. This data also helps them with revenue projection for current and upcoming financial years. ● Direct interaction with the client for requirement gathering and analysis involved in design. ● Importing and exporting data into HDFS and Hive using Sqoop. ● Convert Elkview script into optimized Hive-based or MPP-SQL based script. ● Apply Various Performance Tuning techniques (viz. File Formats-Parquet, ORC, Data Modelling - Partitioning, Bucketing), Create UDFs wherever required.
Software Engineer
05.2006 - 06.2012 |SunGard Global Solutions
Asset Management Legacy System, C++ / Java / Unix, Oracle 10g, ProC
Global Plus. SunGard's asset management, custody, and accounting platform used by financial services institutions across the world to manage their trust, private banking. ● Analysis, Designing, Development and Testing, and Bug fixing, Mentoring Teammates on product feature and related functionality.
Software Engineer
06.2004 - 04.2006 |OPUS Software Solution Pvt. Ltd.
UNIX, C
NCR – Application Support. ● Porting NCR-IPCS6000 cheque reader/ sorter product built on NCR-MP-RAS (UNIX) box in C language to Sun Sparc Solaris 5.8 (UNIX) box. Successful UAT completion at NCR (Mumbai) & NCR (Japan) end client MITSUBISHI Bank.

Educational background

Math & Computer Application (Bachelor’s Degree)
Till 2001
Mumbai University
Specialist
Till 1996
MH State - SCC
Engineering
Till 1998
MH State - HSC (Mumbai Divisional Board)
Computer Science | MCA
Till 2004
PUCSD

Additional education

Graph Database and Ontology, Neo4j Graph Database Understanding
Till 10.2021
Certificates / Courses
Hadoop 2.0 (Could era 5.4) project and training completion
Till 12.2015
EduPristine Training Center
New Trends in Machine Learning & Data Science with R/Python training
09.2019 - 10.2020
Udemy / Cognixia Learning Centers

Languages

EnglishProficient