Daniel Sun

Reddit Jul 2022 - Present

Reddit's P0 Media Safety Detection - On-Premises Migration

Architected and deployed on-premises P0 media detection (CSAM/NCIM) using Python and PhotoDNA, migrating from third-party cloud API -> reduced end-to-end latency by 40-60% across millions of daily uploads.

Implemented nearest-neighbor search with FAISS IVF(nlist=4096, nprobe=8), achieving 100% recall vs. brute-force Flat index while leveraging multithreaded parallel search.

Optimized Pillow pipeline: eliminated redundant .open()calls, moved to in-memory storage, upgraded library -> reduced RPC latency by several hundred milliseconds per request.

Profiled bottlenecks using cProfile/py-spy and designed SOA integrating Media Service, CCS, and AWS S3(Temp + Review buckets).

Evolution of Reddit's In-House P0 Media Detection - HMA & Internal Hash DB

Led migration from custom stack to Meta's HMA (Hasher-Matcher-Actioner) deployed on-prem, enabling gradual rollouts and per-hash false-positive disablement.

Onboarded StopNCII -> detected 100+ violating media/month, plus Tech Against Terrorism(UN) and NCMEC Take it Down hashsets via HMA in Python.

Built internal hash database in Python with FAISS index to memorize operator CSAM decisions, eliminating redundant manual reviews.

Launched auto-blocking (Sept 2024) matching every upload against internal DB -> achieved California AB 1394 compliance.

Spearheading evaluation of Google's Content Safety API (AI) to detect previously unseen CSAM beyond hashing-matching.

Mentored 3-4 engineers on FAISS tuning, Pillow optimization, and cProfile debugging.

Member Of Technical Staff Remote

Trust Lab Jul 2020 - May 2022

DetectAI, ModAI - Full-Stack Threat Discovery & Labeling Platform

Built end-to-end full-stack platform using Python(FastAPI) and React + TypeScript, enabling Trust & Safety teams to detect coordinated fraud, synthetic identities, and cross-platform scam networks.

Designed real-time threat dashboards with WebSockets and Recharts visualizing AI agentic investigation leads, reducing threat investigation time from hours to minutes.

Created multi-modal labeling queue UI (React Hook Form, Zod) for image/text/video moderation, improving labeling efficiency by 25% for 5 of top 10 global social platforms.

Developed Node.js + Express middleware and deployed full-stack apps on AWS S3/CloudFront with Docker + GitHub Actions CI/CD, ensuring SOC 2/GDPR compliance.

Built internal case management UI (Angular/D3.js) and customer onboarding portal, reducing integration time from weeks to 3 days.

Mentored 2 junior engineers on TypeScript, React best practices, and API design.

Senior Software Engineer San Francisco, CA

Pinterest Jul 2018 - Jul 2020

PinSets: Query Safety at Pinterest - Unsafe Query Detection System

Co-authored peer-reviewed paper on PinSets, expanding 20 drug-related seed queries into 15,670 positive examples at >99% precision, reducing unsafe query suggestions by 90%.

Built Java backend for Content Safety Service, integrating fastText classifier with behavioral fallback for non-compositional queries (e.g., "durban poison", "moon rock bud").

Designed session-based scoring algorithm using query co-occurrence within user sessions to resolve ambiguous terms like "pot", "weed", and "nude".

Constructed bipartite graph (queries<->ngrams) with custom association strength scoring to identify diagnostic ngrams from seed sets.

Optimized signal storage and retrieval using AWS DynamoDB and MySQL, slashing p95 latency through query optimization and caching strategies.

Deployed services on Kubernetes with Jenkins CI/CD, maintaining four-nines uptime for safety-critical infrastructure.

Collaborated with ML engineers to onboard new classifier signals (e.g., self-harm, harassment, violent content) into production serving pipelines.

Created internal dashboards using React + Redux to visualize expansion results and model disagreement for hybrid serving.

Software Engineer Washington D.C. Metro Area

Facebook Aug 2016 - May 2018

Led child safety engineering efforts, improving NCMEC reporting reliability with metrics-driven monitoring and enhanced logging to operationalize bug detection and faster debugging.

Supported new Facebook products (e.g., Messenger Kids) for NCMEC compliance, coordinating cross-functionally with engineering, policy, and legal teams.

Set H2 2017 and H1 2018 roadmaps for the project and executed successfully, pivoting team priorities through metrics-driven impact analysis.

Improved evidence collection software by adding Thrift services in Django to leverage existing Python code for Instagram data parsing, communicating with core evidence collection code in Hack(Facebook's PHP dialect).

Built proprietary video matching service for objectionable content at upload using industry-standard image-hashing algorithm(PhotoDNA) to support hash sharing across safety partners.

Hire this person

Experience

Education

University of Maryland

Reviews

Similar people near San diego

Rekha Immadi Immadi

Matt Acosta

Ravi Choudhary

Alex Bellina

Jack Mccullough

Francisco Ponce

Other similar people

Jackie Boctor

Jonathan Wheatman

Nilesh Karwa

Wossen Fekadie

Andrew Snavely

Rekha Immadi Immadi

Related