- Strong proficiency in Python, with substantial experience in FastAPI or a similar asynchronous Python framework.
Hands-on experience with local LLMs utilizing Ollama, vLLM, llama.cpp, or equivalent tools, including resource management and quantization techniques.
Proven experience in developing RAG systems with LangChain or LlamaIndex, focusing on chunking methods and embedding model selection.
Practical knowledge of vector databases like ChromaDB, Qdrant, Weaviate, or pgvector, as well as PostgreSQL for structured data.
Expertise in document processing, including parsing PDFs (including scanned documents), OCR techniques like Tesseract or PaddleOCR, and extracting data from Office formats.
Basic administration skills for Windows Server, including managing services via NSSM, and using PowerShell, IIS, or Nginx.
Familiarity with Nginx settings for reverse proxy functionality, TLS, security headers, and rate limiting.
Knowledge of authentication methods such as JWT, OAuth2, and RBAC design;
experience with Microsoft Entra ID or Azure AD integration.
Experience with Docker for local development and replication of deployments.
A strong focus on security and privacy, with comfort discussing threat models and data management practices.
Responsibilities:
- Design and implement a self-hosted LLM architecture using Ollama, selecting and assessing open-weight models based on their effectiveness, speed, and resource requirements.
Develop a production-ready RAG pipeline utilizing LangChain or LlamaIndex to handle a diverse collection of documents, incorporating techniques for chunking, embedding, and retrieval.
Establish and maintain a local vector database in conjunction with PostgreSQL for structured data management, as well as local object storage solutions like MinIO.
Create a FastAPI backend that provides endpoints for data retrieval, summarization, extraction, and reporting, ensuring rigorous validation and logging of inputs.
Implement authentication and authorization protocols using JWT and role-based access control, integrating with Microsoft 365 / Entra ID where needed for multi-tenant architecture.
Deploy the system on Windows Server, configuring services with NSSM and implementing Nginx as a reverse proxy with proper security settings.
Design workflows for field extraction and document classification that include confidence scoring and exception handling, ensuring low-confidence results are flagged for human review.
Create evaluation tools for prompts, retrievers, and overall accuracy, maintaining them as models and document content evolve.
Enforce data protection standards throughout the project lifecycle, including encryption protocols, least-privilege access, secret management, and thorough data flow documentation.
Company:
We are a forward-thinking company dedicated to revolutionizing the real estate workflow by implementing a secure, self-hosted document analysis and reporting platform. Offering a competitive salary range of $70,000 – $95,000 per year, we also provide valuable benefits such as health and dental insurance, flexible spending accounts, vision coverage, and paid time off. We are located in Brooklyn, NY, and are looking to expand our talented team with individuals who share our enthusiasm for developing innovative solutions.
Keywords
Microsoft ExcelOffice 365Microsoft 365Proxy serverC++OauthPostgresqlPowershellPythonApache BrooklynJWtNginxQuantizationData managementReplication