Skip to main content

Hello, I'm

Saikat Bandyopadhyay

I'm a

Decoding complex systems with data

About

I am an engineer focused on building next-generation data solutions.

Increasing data volume across domains requires scalable, robust, auditable processes and data pipelines to generate actionable business intelligence.

I bring systems design, data-driven storytelling, and production-grade software development to engineering teams tackling hard problems.

Production-Grade

Scalable software design with CI/CD discipline, containerized deployments, and data security built in from day one

Audit-Ready

Traceable, fault-tolerant systems for highly regulated environments — CLIA/CAP, HIPAA/SOC 2, healthcare, and industrial compliance

Data Security

PII-blind architectures, split-key encryption, and privacy-preserving inference pipelines for sensitive clinical and genomic data

Deep Learning

Statistical modeling, mathematical simulation, and neural networks — from Bayesian causal inference to real-time medical image classification

Projects

Causal Gene Discovery Pipeline

Computational colocalization pipeline integrating GWAS and bulk RNA-seq eQTL data to identify candidate causal genes in Diversity Outbred mice. Implemented conditional analysis with bootstrapping to validate concordance between signals.

Impact: Identified a novel lncRNA and 2 protein-coding genes providing mechanistic interpretation of BMD-GWAS loci — selected as Reviewer’s Choice Abstract at ASHG 2025

R coloc bnlearn GWAS eQTL HPC/SLURM

Proteogenomics: Beyond the Reference Proteome

Nextflow-based pipeline integrating long-read RNA sequencing with mass spectrometry to discover novel transcript isoforms and protein products absent from GENCODE annotations.

Impact: Identified novel isoforms absent from GENCODE by integrating 2 modalities (RNA-seq + targeted mass spectrometry) — published in JASMS 2024

Nextflow Long-Read Seq Mass Spectrometry Proteomics

Single-Cell Trajectory Analysis of Bone Marrow

Pseudotime trajectory analysis on single-cell RNA-seq data characterizing gene expression dynamics along osteoblast and osteoclast differentiation — mapping the molecular choreography of bone formation.

Impact: Identified 5 novel genes likely important for osteoblast and osteoclast differentiation from BMSC scRNA-seq data

Slingshot scRNA-seq WGCNA Bayesian Networks

Low-Pass WGS Processing Pipeline

High-throughput processing pipeline for Low-Pass Whole Genome Sequencing achieving 15% reduction in genotyping costs while maintaining fidelity for downstream haplotype calling and eQTL analysis.

Impact: Cut mouse genotyping costs 15% while preserving 98% accuracy in genotype calls for downstream haplotype calling and eQTL analysis

Python Bash SLURM WGS Imputation

FAIR Data Governance Framework

Architected a standardized data governance framework enforcing FAIR principles across all research data assets — transforming an unstructured research environment into a reproducible, audit-ready ecosystem.

Impact: Standardized FAIR data-governance for shared NGS infrastructure across 5+ concurrent projects via automated metadata, provenance tracking, and CI/CD

FAIR Principles Data Governance ETL Metadata Standards

Skills

View as a compact list

Clinical Pipeline Design

CLIA/CAP Compliance HIPAA/SOC 2 Automated Validation Gates Audit Trail Logging Redundancy & Failover Real-Time QC Monitoring

Pipeline Engineering

Nextflow Snakemake WDL/CWL Illumina DRAGEN SLURM/HPC Docker/Singularity FAIR Governance Automated ETL

Statistical Genetics & Causal Inference

GWAS eQTL Mapping Colocalization (coloc) Conditional Analysis w/ Bootstrapping Bayesian Networks (bnlearn) Mendelian Randomization

Genomics & Multi-Omics

WGS/LP-WGS DNA-seq Long-Read RNA-seq ChIP-seq/Cut&Run Proteogenomics Variant Calling

ML/DL & Public Data

ResNet CNNs Medical Image Segmentation Cancer Genomics TCGA CPTAC GENCODE SRA GEO PRIDE

Single-Cell & Network Biology

scRNA-seq Slingshot WGCNA Cell-Type Deconvolution Hub Gene ID Gene Regulatory Networks

Programming & Tools

Python R Bash SQL Git Linux/Unix SLURM/HPC Cloud Computing REST APIs Jupyter R Markdown

Journey

Computational Biology Research Fellow

Aug 2024 — Present

University of Virginia — Dept. of Genome Sciences

  • Built a colocalization pipeline integrating Diversity-Outbred mouse GWAS with bulk RNA-seq eQTL data; designed qPCR experimental validation of loci, identifying a novel lncRNA and 2 protein-coding genes and providing a mechanistic interpretation of BMD-GWAS loci — selected as Reviewer’s Choice Abstract at ASHG 2025
  • Identified 10+ validation-ready bone cell-type-specific (osteoblast, osteoclast) key driver genes through integration of bulk RNA-seq deconvolution and WGCNA
  • Performed Slingshot pseudotime trajectory analysis on scRNA-seq data generated on BMSC to identify 5 novel genes likely important for osteoblast and osteoclast differentiation

Senior Data Analyst

Sep 2022 — Jul 2024

University of Virginia — Dept. of Genome Sciences

  • Mentored a BioLab exchange student from Poland in a project building a high-throughput Low-Pass WGS pipeline for haplotype calling and eQTL analysis, cutting mouse genotyping costs 15% while preserving 98% accuracy in genotype calls
  • Deployed a Nextflow Long-Read Proteogenomics pipeline integrating 2 modalities (RNA-seq + targeted mass spectrometry), identifying novel isoforms absent from GENCODE (Korchak et al., JASMS 2024)
  • Designed FAIR data-governance standards for shared NGS infrastructure across 5+ concurrent projects through automated metadata and provenance tracking along with a version-controlled CI/CD pipeline

Bioinformatician, Assay Productization

Oct 2021 — Jun 2022

PierianDx

  • Delivered a sequence QC module for the Clinical Genomics Workspace, written with Python and WDL using the Illumina DRAGEN platform, processing 200–1,000 patient samples per week and supporting customer retention for the assay-productization team

Senior Project Associate

Nov 2018 — Sep 2021

CSIR — Institute of Genomics and Integrative Biology (IGIB)

  • Led the architecture design of RAPID-CT, a radiology image classification system built on a ResNet-backbone deep-learning framework for intracranial hemorrhage detection from brain CT scans that produced 98% accuracy in radiologist-based validation and reduced per-case processing time by 60%
  • Engineered a PII-blind REST API with a 2-key split-encryption scheme, resolving a critical privacy bottleneck for clinical informatics infrastructure
  • Led the development of ProMetaDB, a semi-automated annotation framework integrating 3 public repositories (PRIDE, JASPAR, iProX) to construct a tissue-specific proteome atlas

Published Works

2025

The septin cytoskeleton is a regulator of intestinal epithelial barrier integrity and mucosal inflammation

JCI Insight, 10(22):e191538

DOI: 10.1172/jci.insight.191538
2024

IS-PRM-Based Peptide Targeting Informed by Long-Read Sequencing for Alternative Proteome Detection

Journal of the American Society for Mass Spectrometry, 35(11):2614–2630

DOI: 10.1021/jasms.4c00119
2020

Application of Computer Assisted Bimolecular Interaction Modelling in Predictive Microbiology

Journal of Advanced Microbiology, 4(1):59–69

DOI: 10.5530/jam.4.1.5

Honors & Recognition

Reviewer's Choice Abstract — ASHG Annual Meeting, 2025

Precision Health Initiative Trainee Stipend — University of Virginia, 2025

Best Poster Award — National Workshop on Big Data & AI in Healthcare, 2019

Let's Build the Future

Send Me a Message

This will open your email client with a pre-filled message.

Based in Charlottesville, VA. Open to Healthcare AI, BioML, and Data Science opportunities.