Ruffin Galactic, Atlanta US (Remote) | MACHINE LEARNING ENGINEER JUL 2023 – PRESENT
- Developed and deployed a machine learning pipeline (machine failure prediction model) using Vertex AI SDK for Python and Sklearn.
- Created a deployment script to automate the deployment of Anything LLM (based on Pinecone and OpenAI GPT) to Digital Ocean Droplets (Ubuntu VM).
- Scheduled daily CRM data ingestion into BigQuery partition tables using Dagster data orchestrator, increasing data accuracy for analysis and decision-making.
- Used Dagster and Airbyte to orchestrate a data pipeline for internal reporting with Bigquery as the data warehouse (instantly.ai, expandi.io and Google Analytics.
- Used Cloud Dataprep to orchestrate transformation jobs for an ecommerce dataset containing Google Analytics session records for a Merchandise client, with BigQuery as the data warehouse.
Tech Stack: GCP (Vertex AI, BigQuery, Dataflow), Digital Ocean, Python, Django, Github, Metabase, Dagster, Airbyte, Streamlit, MLFlow
Danone, Paris | AI SOFTWARE DEVELOPER – Work Study OCT 2022 – SEP 2023
Artificial Intelligence
- Developed and pitched 4 conversational Artificial Intelligence (AI) ideas/demos to business teams and innovation coaches in France, Poland and Netherlands.
- AI Product Management - discovery, client interview, strategy, budget, and delivery.
- Saved cost and time using parameter pruning and quantization techniques like PEFT and QLora to fine-tune Llama 2 and Flan-T5 – reduced model training and time cost.
- Instruct fine-tuned Llama 2 to generate question and answer dataset from unstructured free-text data for NLP projects (PEFT and QLora).
- Implemented in-context and zero-shot learning techniques to enable pre-trained models to generalize and perform tasks outside of its training data.
- Developed custom Python code to connect a conversational AI application with Azure OpenAI, AWS Kendra, Sagemaker Endpoints, HuggingFace Transformers, Slack and Langchain, enabling seamless communication between different services.
Data Engineering
- Developed a pipeline for digital asset data using AWS cloud services (Lambda, ECR, Fargate, S3) and Docker – reducing data processing time from 2 weeks to a few hours.
- Used Python multithreading to parallelize resource-intensive operations, enabling the application to handle concurrent ETL tasks more effectively.
- Wrote unit tests and conducted thorough code reviews to maintain code quality.
- Implemented a CI/CD pipeline GitHub Actions, Docker, and ECR (managed Kubernetes) with SSO access.
Tech Stack: AWS (ECR, ECS, Lambda, s3), Azure Open AI, Power BI, Python, LangChain, Slack Bolt, Huggingface, Streamlit, Github, Docker
Alstom, Paris | DATA SCIENTIST (NLP) - Internship MAY 2022 – SEP 2022
- Used Sentence (sBert) Transformers (Hugging Face) to extract technical data from train documentation, resulting in a significant reduction in manual data processing time.
- Implemented an NLP-driven search system using Huggingface's Transformers to improve technical document retrieval accuracy.
- Enabled engineers to focus on critical tasks by fine-tuning a summarization model to condense lengthy technical documents.
Tech Stack: Dataiku, Python, Huggingface
Aivancity, Paris | ANALYTICS ENGINEER (Various Internships) JAN 2022 – SEP 2022
- Conducted thorough analysis of business needs in data analysis and reporting, utilizing tools such as Power BI to visualize and present data-driven insights effectively.
- Developed a real-time data streaming pipeline from IoT (Raspberry Pi) to AWS Kinesis to power the continuous monitoring of beehive health, harnessing live data insights to ensure the well-being of bees.
- Developed and deployed a Quality Control (Deep Learning) Application with Streamlit, Docker, AWS Elastic Beanstalk, Yolov8, Roboflow and Meta SAM model.
- Provided mentorship and guidance to junior colleagues.
Tech Stack: Azure, AWS (Kinesis, EBS), NLP, Python, Sklearn, Streamlit, Github, Docker, Yolov8, Plotly, Spark