About
Hi! I am Aditya. I am a passionate Data Scientist with 2+ years with expertise in Python, Machine Learning, and Time Series Analysis. I have tackled a range of complex data challenges from fraud detection to cashflow forecasting, and also I have utilized NLP (LLM models) and Explainable AI skills for actionable insights.
The Journey:
Passionate about statistics and data science, I've spent over one year and 8 months honing my expertise at Cloudcraftz Solutions Pvt Ltd. From fraud detection visualizations to explainable AI, I bridged the gap between technical prowess and actionable insights. My software engineering skills, evident in platform development and forecasting models, extend to international projects where I've implemented NLP pipelines and streamlined labeling processes. Enthusiastic about the potential of LLMs, I'm constantly learning and eager to leverage my experience and unwavering enthusiasm to solve complex problems and drive data-driven decisions across industries.
Recent Work:
- Finetuned a Large Language Model for answering multiple choice questions.
- Currently, I’m engaged with a project related to forecasting of commodity prices, using historical price data and historical news data with
ML Classification Algorithms and LLM. Successfully crafted a multi-classifier pipeline predicting price fluctuations (in percentage) of commodities within a
specific time-frame. I have proactively written and tested prompts on LLM models (GPT, LLaMa, Gemini) using LangChain to extract news-driven sentiment and
relevance scores towards commodity prices. Currently engrossed in pioneering a collaboration on integrating textual data with the multi-classifier pipeline
for a novel business application.
Explore some of my notable work and projects in the links below!
Skills
AI/ML Domains
- Machine Learning
- Deep Learning
- Natural Language Processing (NLP)
- Prompt Engineering
- Generative AI
- Exploratory Data Analysis (EDA)
- Statistical Modelling
- Time Series Analysis and Forecasting
Language and Libraries
- Python
- NumPy
- Pandas
- Plotly
- Scikit-Learn
- TensorFlow
- Keras
- PyTorch
- HuggingFace
- LangChain
Soft
- Adaptibility
- Problem Solving
- Customer Service
- Communication Skills
- Time Mangement
- Critical Thinking
- Collaborative Work
- Presentation Skills
Experience
-
Junior Data Scientist
July 2022-Present
Cloudcraftz Solutions-
Commodity Price Change Prediction
-
Developed an advanced multi-classifier pipeline that accurately predicts price fluctuations (in percentage) of commodities within a specific time-frame.
-
Spearheaded the creation and testing of prompts on cutting-edge language models (GPT, LLaMa, Gemini) using LangChain, enabling the extraction of sentiment and relevance scores from news articles to forecast commodity prices.
-
Leading a pioneering collaboration to integrate textual data with the multi-classifier pipeline, revolutionizing business applications in the industry.
-
Conducted thorough evaluations and analyses of various trading strategies, leveraging data visualization techniques and statistical tests to optimize performance.
-
Delivered actionable insights for enhancing trading strategies, driving profitability and success in high-frequency trading environments.
-
Developed innovative network visualization solutions for improved understanding of complex financial data.
-
Spearheaded the identification of suspicious nodes through comprehensive analyses:
-
Analyzed transaction volume, centrality metrics, and transaction amounts.
-
Deployed advanced techniques to pinpoint anomalies and potential fraud indicators.
-
-
Pioneering advancements in outlier detection:
-
Actively learning and utilizing PyTorch Geometry for Graph Neural Network development.
-
Aiming to refine outlier detection models for robust network security and enhanced fraud prevention.
-
-
-
Enhanced Financial Decision-Making with Innovative Cashflow Forecasting Method
-
Collaborated on developing a sophisticated cashflow forecasting tool for an NBFC client to enhance financial decision-making.
-
Utilized PgAdmin to preprocess raw financial data, translating complex insights into intuitive visualizations.
-
Conducted extensive feature engineering to improve predictive accuracy, resulting in a robust machine learning model.
-
Achieved a remarkable 2.9% reduction in RMSE through a stacked model approach, showcasing significant improvement in forecasting precision.
-
-
An Explainable AI Product: Innovative Development of Platform to interpret Machine Learning Models with Shapley Values and User-Centric Design
-
Applied Shapley values to interpret machine learning models, providing both global and local explanations for feature contributions and their impact on predictions.
-
Developed visualization and explanation tools for quantifying feature contributions, enhancing model interpretability.
-
Integrated counterfactual-based explanations for user-driven exploration of hypothetical scenarios, improving overall model understanding.
-
Collaborated with cross-functional teams to ensure actionable insights for non-technical stakeholders.
-
Emphasized user-centric design by implementing intuitive interfaces and interactive visualizations to enhance user engagement and comprehension of model output.
-
-
Empowered Stock Price Predictions with Advanced Sentiment Analysis Pipeline
-
Developed a web scraping and NLP pipeline for efficient gathering and extraction of textual data from diverse news websites.
-
Utilized Hugging Face Transformers for fine-tuning a sentiment analysis model.
-
Achieved high-precision sentiment classification, enhancing the accuracy of stock price predictions.
-
-
An EDA Platform: Developed User-Friendly EDA Platform for Comprehensive Data Exploration
-
Designed and deployed an in-house Exploratory Data Analysis (EDA) platform for tabular and time series data.
-
Implemented user-friendly visualization tools, enabling non-technical users to make data-driven decisions.
-
Enhanced the platform's statistical analysis capabilities for comprehensive data exploration.
-
-
Optimized Database Labeling with NLP-Based Services
-
Delivered NLP-based dataset labeling services for an international client.
-
Streamlined the database labeling process through efficient implementation.
-
-
Commodity Price Change Prediction
-
Research Intern
March, 2022 - July, 2022
USAID Project under LISA 2020, in association with Department of Statistics, University of Calcutta and National Institute of Wind Energy, Government of India-
Optimized Predictive Forecasting with Advanced Regression Time-Series Models
-
Conducted thorough data exploration using rigorous Data Visualization and Exploratory Data Analysis techniques.
-
Implemented advanced Regression-based-Time-Series Models to enhance predictive forecasting of the GHI.
-
Achieved an impressive R-squared score of 0.92, showcasing the model's high predictive accuracy.
-
-
Optimized Predictive Forecasting with Advanced Regression Time-Series Models
-
Research Intern
September, 2021 - July, 2022
A. K. Choudhury School of IT, University of Calcutta-
Environmental Sound Classification with CNN Models
-
Engaged in a hands-on project on Environmental Sound Classification using the ESC-50 dataset.
-
Applied audio processing techniques to modify and extract essential spectrograms.
-
Implemented Convolutional Neural Network (CNN) models for sound classification tasks.
-
Demonstrated consistent proficiency with an impressive average accuracy score of 87%.
-
-
Environmental Sound Classification with CNN Models
Personal Project
-
AI Culinary Symphony: Crafting Efficient Large Language Models with Precision and Flavor
-
Description
Designed and implemented an end-to-end training pipeline for Large Language Models (LLMs), akin to crafting a gourmet meal. This involved meticulous data gathering, diverse prompts creation, and leveraging key libraries (trl, peft, transformers, torch) for effective fine-tuning. The process incorporated optimization techniques (wandb, einops, pandas, datasets, accelerate, bitsandbytes) for efficient model training.
-
Processes Showcased
-
Monitored GPU usage and provided concise training progress reports, ensuring efficient model learning without resource bottlenecks.
-
Ensured reproducibility across experiments, guaranteeing consistent and predictable results.
-
Efficiently gathered and prepared training and validation datasets, optimizing the model's data ingestion process.
-
Crafted a customizable tokenizer for precise language dissection, allowing for tailored learning experiences.
-
Implemented quantization and LORA techniques for model size reduction and optimization.
-
Constructed an intelligent and efficient language model, considering PEFT-enabled architectures and k-bit training for resource-conscious brilliance.
-
-
Outcome
Successfully orchestrated a comprehensive training pipeline, balancing flavor-rich language model development with resource efficiency. Achieved a model capable of sophisticated language tasks while respecting computational constraints.
-
Tech-stack -
PyTorch, Hugging Face Hub, WandB, trl, peft, transformers
- Notebooks and Spaces
-
Description