About

Hi! I am Aditya. I am a passionate Data Scientist with 2+ years with expertise in Python, Machine Learning, and Time Series Analysis. I have tackled a range of complex data challenges from fraud detection to cashflow forecasting, and also I have utilized NLP (LLM models) and Explainable AI skills for actionable insights.

The Journey:

Passionate about statistics and data science, I've spent over one year and 8 months honing my expertise at Cloudcraftz Solutions Pvt Ltd. From fraud detection visualizations to explainable AI, I bridged the gap between technical prowess and actionable insights. My software engineering skills, evident in platform development and forecasting models, extend to international projects where I've implemented NLP pipelines and streamlined labeling processes. Enthusiastic about the potential of LLMs, I'm constantly learning and eager to leverage my experience and unwavering enthusiasm to solve complex problems and drive data-driven decisions across industries.

Recent Work:

- Finetuned a Large Language Model for answering multiple choice questions.
- Currently, I’m engaged with a project related to forecasting of commodity prices, using historical price data and historical news data with ML Classification Algorithms and LLM. Successfully crafted a multi-classifier pipeline predicting price fluctuations (in percentage) of commodities within a specific time-frame. I have proactively written and tested prompts on LLM models (GPT, LLaMa, Gemini) using LangChain to extract news-driven sentiment and relevance scores towards commodity prices. Currently engrossed in pioneering a collaboration on integrating textual data with the multi-classifier pipeline for a novel business application.


Explore some of my notable work and projects in the links below!

Floating Image

Contact

Skills

AI/ML Domains

  • Machine Learning
  • Deep Learning
  • Natural Language Processing (NLP)
  • Prompt Engineering
  • Generative AI
  • Exploratory Data Analysis (EDA)
  • Statistical Modelling
  • Time Series Analysis and Forecasting

Language and Libraries

  • Python
  • NumPy
  • Pandas
  • Plotly
  • Scikit-Learn
  • TensorFlow
  • Keras
  • PyTorch
  • HuggingFace
  • LangChain

Soft

  • Adaptibility
  • Problem Solving
  • Customer Service
  • Communication Skills
  • Time Mangement
  • Critical Thinking
  • Collaborative Work
  • Presentation Skills

Experience

  • Junior Data Scientist
    Cloudcraftz Solutions

    July 2022-Present
    • Commodity Price Change Prediction
      • Developed an advanced multi-classifier pipeline that accurately predicts price fluctuations (in percentage) of commodities within a specific time-frame.

      • Spearheaded the creation and testing of prompts on cutting-edge language models (GPT, LLaMa, Gemini) using LangChain, enabling the extraction of sentiment and relevance scores from news articles to forecast commodity prices.

      • Leading a pioneering collaboration to integrate textual data with the multi-classifier pipeline, revolutionizing business applications in the industry.

      High Frequency Trading Strategy Analysis
      • Conducted thorough evaluations and analyses of various trading strategies, leveraging data visualization techniques and statistical tests to optimize performance.

      • Delivered actionable insights for enhancing trading strategies, driving profitability and success in high-frequency trading environments.

      Strategic Development of Transaction Network Visualization and Graph Neural Network Models for Enhanced Financial Security
      • Developed innovative network visualization solutions for improved understanding of complex financial data.

      • Spearheaded the identification of suspicious nodes through comprehensive analyses:

        1. Analyzed transaction volume, centrality metrics, and transaction amounts.

        2. Deployed advanced techniques to pinpoint anomalies and potential fraud indicators.

      • Pioneering advancements in outlier detection:

        1. Actively learning and utilizing PyTorch Geometry for Graph Neural Network development.

        2. Aiming to refine outlier detection models for robust network security and enhanced fraud prevention.

    • Enhanced Financial Decision-Making with Innovative Cashflow Forecasting Method
      • Collaborated on developing a sophisticated cashflow forecasting tool for an NBFC client to enhance financial decision-making.

      • Utilized PgAdmin to preprocess raw financial data, translating complex insights into intuitive visualizations.

      • Conducted extensive feature engineering to improve predictive accuracy, resulting in a robust machine learning model.

      • Achieved a remarkable 2.9% reduction in RMSE through a stacked model approach, showcasing significant improvement in forecasting precision.

    • An Explainable AI Product: Innovative Development of Platform to interpret Machine Learning Models with Shapley Values and User-Centric Design
      • Applied Shapley values to interpret machine learning models, providing both global and local explanations for feature contributions and their impact on predictions.

      • Developed visualization and explanation tools for quantifying feature contributions, enhancing model interpretability.

      • Integrated counterfactual-based explanations for user-driven exploration of hypothetical scenarios, improving overall model understanding.

      • Collaborated with cross-functional teams to ensure actionable insights for non-technical stakeholders.

      • Emphasized user-centric design by implementing intuitive interfaces and interactive visualizations to enhance user engagement and comprehension of model output.

    • Empowered Stock Price Predictions with Advanced Sentiment Analysis Pipeline
      • Developed a web scraping and NLP pipeline for efficient gathering and extraction of textual data from diverse news websites.

      • Utilized Hugging Face Transformers for fine-tuning a sentiment analysis model.

      • Achieved high-precision sentiment classification, enhancing the accuracy of stock price predictions.

    • An EDA Platform: Developed User-Friendly EDA Platform for Comprehensive Data Exploration
      • Designed and deployed an in-house Exploratory Data Analysis (EDA) platform for tabular and time series data.

      • Implemented user-friendly visualization tools, enabling non-technical users to make data-driven decisions.

      • Enhanced the platform's statistical analysis capabilities for comprehensive data exploration.

    • Optimized Database Labeling with NLP-Based Services
      • Delivered NLP-based dataset labeling services for an international client.

      • Streamlined the database labeling process through efficient implementation.


  • Research Intern
    USAID Project under LISA 2020, in association with Department of Statistics, University of Calcutta and National Institute of Wind Energy, Government of India

    March, 2022 - July, 2022
    • Optimized Predictive Forecasting with Advanced Regression Time-Series Models
      • Conducted thorough data exploration using rigorous Data Visualization and Exploratory Data Analysis techniques.

      • Implemented advanced Regression-based-Time-Series Models to enhance predictive forecasting of the GHI.

      • Achieved an impressive R-squared score of 0.92, showcasing the model's high predictive accuracy.


  • Research Intern
    A. K. Choudhury School of IT, University of Calcutta

    September, 2021 - July, 2022
    • Environmental Sound Classification with CNN Models
      • Engaged in a hands-on project on Environmental Sound Classification using the ESC-50 dataset.

      • Applied audio processing techniques to modify and extract essential spectrograms.

      • Implemented Convolutional Neural Network (CNN) models for sound classification tasks.

      • Demonstrated consistent proficiency with an impressive average accuracy score of 87%.

Personal Project

  • AI Culinary Symphony: Crafting Efficient Large Language Models with Precision and Flavor

    • Description

        Designed and implemented an end-to-end training pipeline for Large Language Models (LLMs), akin to crafting a gourmet meal. This involved meticulous data gathering, diverse prompts creation, and leveraging key libraries (trl, peft, transformers, torch) for effective fine-tuning. The process incorporated optimization techniques (wandb, einops, pandas, datasets, accelerate, bitsandbytes) for efficient model training.

    • Processes Showcased
      • Monitored GPU usage and provided concise training progress reports, ensuring efficient model learning without resource bottlenecks.

      • Ensured reproducibility across experiments, guaranteeing consistent and predictable results.

      • Efficiently gathered and prepared training and validation datasets, optimizing the model's data ingestion process.

      • Crafted a customizable tokenizer for precise language dissection, allowing for tailored learning experiences.

      • Implemented quantization and LORA techniques for model size reduction and optimization.

      • Constructed an intelligent and efficient language model, considering PEFT-enabled architectures and k-bit training for resource-conscious brilliance.

    • Outcome

        Successfully orchestrated a comprehensive training pipeline, balancing flavor-rich language model development with resource efficiency. Achieved a model capable of sophisticated language tasks while respecting computational constraints.

    • Tech-stack
        PyTorch, Hugging Face Hub, WandB, trl, peft, transformers
    • Notebooks and Spaces