Data Analyst/ Bioinformatics Research Analyst - Kelly Products Inc.
(2024-10 - 2026-03)
- Streamlined bioinformatics workflows to find significant SNP variants and pharmacogenomic variants associated with the most common chronic conditions like heart disease, cancer, diabetes, Alzheimer's disease, and various others.
- Designed and implemented a Whole Genome Shotgun (WGS) analysis pipeline for a metagenomic research project using R, Python, and Bash scripting.
- Assessed raw sequencing data quality and evaluated sample diversity metrics to ensure robust downstream analysis.
- Simulated realistic WGS data using VarSim by introducing known genetic variants and generating Illumina-based sequencing reads.
- Performed quality control of sequencing data using Fastp to trim adapters and filter low-quality reads.
- Conducted alignment of reads to the reference genome using BWA, followed by sorting and indexing with Samtools.
- Executed variant calling using GATK HaplotypeCaller and applied filtration criteria using GATK VariantFiltration.
- Processed paired-end sequencing data using PLINK and other bioinformatics tools to generate summary statistics.
- Conducted polygenic risk score (PRS) analysis based on summary statistics to explore genetic predispositions.
- Utilized PLINK for genotype data management, quality control, and association testing in a WGS context.
- Performed Variant annotation using Ensembl VEP (Variant Effect Predictor).
- Performed shotgun metagenomic and 16S rRNA sequencing analyses on single-end and paired-end datasets.
- Applied taxonomic classification and functional profiling using tools such as QIIME2, Anvi'o, and Biobakery.
- Integrated microbial community profiling results with host genomic data for comprehensive analysis.
- Implemented custom data parsing and visualization scripts in R and Python to interpret complex outputs.
- Managed and automated workflows for high-throughput sequence analysis using shell scripting.
- Built and maintained modular Snakemake/Nextflow-based WGS and metagenomic workflows integrating Fastp, BWA, Samtools, GATK, PLINK, Ensembl VEP, QIIME2, Anvi'o, and BioBakery to automate QC, alignment, variant calling, annotation, taxonomic profiling, and reproducible downstream reporting.
- Created a Python script to automate the generation of WGS participant reports.
Bioinformatics Research Technician - University of Texas at Dallas
(2023-08 - 2024-08)
Applied academic knowledge to real-time research applications at Dr. Cisneros Lab at UTD. Led a pivotal part of the research project by extracting, cleansing, managing, and analyzing NGS datasets at hand to draw case-control association conclusions from the results obtained as a bioinformatician.
- Initiated a pipeline that processed vast genomic data, increasing data analysis efficiency using R and tools like PLINK resulting in streamlined operations.
- Using HyDn-SNP-S (Hypothesis driven single nucleotide polymorphism search) technique for finding cancer biomarkers in DNA repair genes in various cancer phenotypes.
- Conducted case-control association and power analysis, enhancing data reliability for cancer research studies using dbGAP datasets and tools like BioConductor.
- Utilized PLINK and R to perform linkage disequilibrium studies, mapping genetic traits in populations for improved research outcomes.
- Implemented variant annotation on significant SNPs using GATK, enhancing understanding of their functional impact in genetic research.
- Analyzed biological pathways using the KEGG database, advancing understanding of the links between proteins and diseases. Obtained breast cancer gene expression data from NCBI (National Center for Biotechnology Information) GEO (Gene Expression Omnibus) database.
- Preprocessed the expression data using python's NumPy and Pandas libraries.
- Extracted required metadata components using python's GEOparse package for further analysis.
- Performed PCA (Principal Component Analysis) for dimensionality reduction of the huge dataset with 20246 variables (genes).
- Built a deep neural network (DNN) from scratch with appropriate cost function and evaluation metrics.
- Obtained 80% training and testing accuracies using the sigmoid activation function.
Junior Data Scientist - Centillion Networks Pvt. Ltd.
(2020-02 - 2022-06)
Worked as a part of team of data scientists at Centillion Networks Pvt. Ltd. to implement established data analysis, data modeling, model evaluation, and model optimization skills. Additionally, learned other necessary techniques for data handling and management using Python and R.
- Developed models addressing client requirements, resulting in enhanced product automation and improved business understanding.
- Performed preprocessing steps like data cleansing, exploratory data analysis, data visualization, feature engineering. Identifying, analyzing, and interpreting trends in datasets using data segmentation techniques and building reports to summarise them.
- Achieved reduced customer churn by implementing ML algorithms such as logistic regression, random forest, and XGBoost models in Python using libraries like TensorFlow, Keras, and PyTorch.
- Implemented automated data validation, improving data quality and accuracy for analysis and decision-making.
- Optimized models for higher accuracies, enhancing model performance through effective evaluation and tuning techniques and libraries like Scikit-Learn, TensorFlow, and Keras.
- Reduced data processing time through optimized ETL pipelines using SQL.
Data Science Intern - IHA Pragyan – AI Hub under IHA Consulting Services Pvt. Ltd.
(2019-07 - 2020-01)
- Generated AI project ideas suitable for MSMEs across different sectors.
- Designed a "Skin Disease Detection using AI" system to identify the type of skin disease and provide relevant information when an image is uploaded.
- Developed a "Health Insurance Fraud Detection System" to assist health insurance companies in identifying potential customers at risk of default, by inputting various parameters (unique for each customer).
- Collaborated with other data scientists to develop an "Automated Disease Prediction System" project.
- Utilized ensemble techniques to create a Predictive Maintenance system for MSMEs in the manufacturing sector, enabling the timely identification of equipment maintenance needs.
- Part of the core team responsible for developing and promoting IHA Pragyan – AI Hub.
Undergraduate Academic Assistant/ TA Integrative Biology, Physics I, Physics II, Statistics, and Discrete Mathematics - University of Minnesota, Rochester
(2015-01 - 2016-05)
- Collaborated with professors and students to improve the course by identifying potential assessments, planning course content, and analyzing students' feedback.
- Interacted with students during classes to facilitate the course material.
- Conducted study sessions to help students with the course material.