US
0 suggestions are available, use up and down arrow to navigate them
What job do you want?

Apply to this job.

Think you're the perfect candidate?

Data Scientist 3

Careers Integrated Resources Inc Louisville, KY (Onsite) Contractor
Title: Senior Data Scientist
Duration: 3+ Months Contract (Possibility for Extension)
Location:
Louisville, KY 40202 - Fully Remote

Role Overview:
We are seeking a Senior Data Scientist to build and deploy LLM-based capabilities for working with large, diverse datasets and documents relevant to growth analytics & bid strategy. This role emphasizes ingestion, document processing, information extraction, and retrieval methods to support analytics use cases in production. Experience with modern LLM tooling and Databricks is required; hands-on experience with advanced reasoning models & agentic/orchestration frameworks are a plus.


Key Responsibilities:
Architect, build, and refine retrieval-grounded LLM systems, including basic and advanced RAG patterns, to deliver grounded, verifiable answers and insights.
Design robust pipelines for ingestion, transformation, and normalization of public and internal data, including ETL, incremental processing, and data quality checks.
Build and maintain document processing workflows across PDFs, HTML, and scanned content, including OCR, layout-aware parsing, table extraction, metadata enrichment, and document versioning.
Develop information extraction pipelines using LLM methods and best practices, including schema design, structured outputs, validation, error handling, and accuracy evaluation.
Own the retrieval stack end-to-end, including chunking strategies, embeddings, indexing, hybrid retrieval, reranking, filtering, and relevance tuning across a vector database or search platform.
Implement web data acquisition where needed, including scraping, change detection, source quality checks, and operational Products like retries and rate limiting.
Establish evaluation and monitoring practices for retrieval and extraction quality, including golden datasets, regression testing, groundedness checks, and production observability.
Collaborate with subject matter experts to translate business needs into practical retrieval and extraction workflows and measurable success criteria.
Communicate complex findings, tradeoffs, and recommendations to technical and business stakeholders, supporting data-driven forecasting and strategy.
Ensure compliance with data governance and security standards when handling sensitive data and deploying systems to production environments.

Qualifications:
Advanced degree in Computer Science, Data Science, Statistics, Engineering, or a related quantitative field.
Minimum of 4 years experience in data science or applied ML/NLP with focus in NLP & GenAI
Proficiency in Python and SQL, with strong engineering practices for maintainable, testable pipelines.
Strong experience with Databricks for data processing and pipeline development, including Spark and common lakehouse patterns.
Demonstrated experience building retrieval-grounded LLM systems and or LLM-based information extraction for real-world use cases.
Experience with document ingestion and parsing, including OCR and handling messy, semi-structured content such as PDFs, tables, forms, and web pages.
Familiarity with vector databases and retrieval concepts, including indexing, embeddings, hybrid retrieval, reranking, and performance and cost tuning.
Strong understanding of best practices for reasoning models and techniques that improve reliability and reduce hallucinations, including grounding and attribution.
Excellent communication skills, with a track record of partnering with stakeholders and turning ambiguous requests into adopted solutions.

Libraries and Tools:
Proficiency with LLM and orchestration libraries such as: openai, google-genai, langgraph, langchain.
Experience with supporting tooling commonly used in production LLM systems, for example: pydantic for schema validation, tenacity for retries, beautifulsoup4 for html data extraction, and standard Python data tooling such as pandas and numpy.
Experience with retrieval and vector tooling, such as: FAISS, Elasticsearch or OpenSearch, and vector database platforms (for example Pinecone, Weaviate, Milvus, Chroma).

Preferred Qualifications:
Exposure to agentic patterns and tool-calling for workflow automation.
Experience working in regulated environments and implementing governance controls such as access control, auditability, and retention.
Get job alerts by email. Join Our Talent Network!

Job Snapshot

Employee Type

Contractor

Location

Louisville, KY (Onsite)

Job Type

Other

Experience

Not Specified

Date Posted

12/19/2025

Job ID

25-68502

Apply to this job.

Think you're the perfect candidate?