University of Cambridge, UK – Data Scientist
March 2024 – November 2024
- Analyzing Quarterly Earnings Call - Bank of England
My team and I developed an AI platform designed to identify early signs of bank distress, aiming to prevent financial instability in the UK using Quarterly Earnings Calls as a data source. This was achieved by applying NLP techniques such as topic modelling, sentiment analysis, LangChain and RAG-based summarisation.
- Created a scalable and efficient data preprocessing of unstructured data from PDF files.
- Developed a proprietary algorithm to summarize the overall sentiment of a financial quarter that closely aligned with the stock market sentiment.
- Time Series Analysis for Sales and Demand Forecasting
Analysed historical book sales data to make data-driven decisions about their future investment in new publications.
- ARIMA, SARIMA, SARIMAX
- Time Series Forecasting with machine learning and deep learning
- Gradient boosting models – Light GBM, XGBoost
- RNN, LSTM, GRU, CNN and Hybrid Techniques
- Topic Modelling customer feedback
Analysed customer feedback to understand what motivates members to join and what factors influence their behaviours once they have joined.
- Sentiment Analysis / Topic Modelling / Text Summarization
- LLM tiiuae/falcon-7b-instruct - Huggingface
- BERTopic
- LDAmodel from Gensim
- Applying Supervised Learning to predict student dropout
Examined student data to predict whether a student will drop out and help institution's financial stability and students’ academic success and personal development. A high dropout rate can lead to significant revenue loss, diminished institutional reputation, and lower overall student satisfaction.
- XGBoost
- Neural Networks - tensorflow, keras
- Customer Segmentation with clustering - Unsupervised Learning
Developed a robust customer segmentation to assist the e-commerce company in understanding and serving its customers better. This helped create a more customer-centric focus, improving marketing efficiency.
- Hierarchical clustering
- k-means clustering
- Detecting the anomalous activity of a ship’s engine
Developed a robust anomaly detection system to protect a company’s shipping fleet by evaluating engine functionality. This helped develop fleet maintenance schedule and reduce fleet down time.
- Interquartile Range (IQR)
- One-class SVM
- Isolation Forest
- Bayesian thinking to a real-world business issue
Performed Bayesian parameter estimation and hypothesis testing to optimize the performance of an e-commerce platform.
- A/B Test using PyMC