US
0 suggestions are available, use up and down arrow to navigate them
PROCESSING APPLICATION
Hold tight! We’re comparing your resume to the job requirements…
ARE YOU SURE YOU WANT TO APPLY TO THIS JOB?
Based on your Resume, it doesn't look like you meet the requirements from the employer. You can still apply if you think you’re a fit.
Job Requirements of Artificial Intelligence | Machine Learning Engineer (AI/ML Operations):
-
Employment Type:
Contractor
-
Location:
Orlando, FL (Onsite)
Do you meet the requirements for this job?
Artificial Intelligence | Machine Learning Engineer (AI/ML Operations)
Careers Integrated Resources Inc
Orlando, FL (Onsite)
Contractor
Job Title: Artificial Intelligence | Machine Learning Engineer (AI/ML Operations)
Job Location: Orlando, FL, 32830
Job Duration: 22 Months+
Job Type: Contract
Job Summary
- We are seeking an AI/ML Engineer with a strong focus on operations, analytics, and platform support.
- This role will manage the day-to-day operational health of AI/ML models, agents, and multi-cloud AI platforms across GCP, Azure, and AWS.
- The ideal candidate will monitor model performance, manage cost optimization initiatives, support AI agents, and collaborate with engineering, infrastructure, and responsible AI teams.
- This is a cross-functional operations role combining MLOps, FinOps, monitoring, and AI platform support.
Key Responsibilities:
AI/ML Operations
- Manage operational workflows for model deployments, updates, and versioning across GCP, Azure, and AWS.
- Monitor model performance metrics including latency, throughput, error rates, token usage, and inference quality.
- Track model drift, accuracy degradation, and performance anomalies; escalate issues to engineering teams.
- Support knowledge base operations including vector embedding pipelines, chunk quality, and refresh cycles.
- Maintain model inventory and technical documentation across multi-cloud environments.
- Coordinate model evaluation cycles with Responsible AI and Core Engineering teams.
Agent & MCP Server Operations:
- Monitor AI agent health, performance, and reliability (AutoGen-based agents, MCP servers).
- Track agent execution metrics such as task completion rates, tool call success/failure, latency, and error patterns.
- Support agent deployment and configuration management workflows.
- Document agent behaviors, known issues, and operational runbooks.
- Coordinate with engineering teams on agent updates, testing, and rollouts.
- Monitor MCP server availability, connection health, and integration status.
FinOps & Cost Management:
- Track and analyze AI/ML cloud spend across GCP (Vertex AI), Azure (OpenAI), and AWS (Bedrock).
- Build cost dashboards by model, application team, use case, and environment.
- Monitor token consumption, inference costs, and embedding/storage costs.
- Identify cost optimization opportunities (model selection, caching, batching, rightsizing).
- Provide cost allocation reports for chargeback/showback.
- Forecast spend trends and flag budget anomalies.
- Partner with Infrastructure and Finance teams on AI cost governance.
Monitoring, Dashboarding & Reporting:
- Build and maintain dashboards for platform performance, model health, agent metrics, and operational KPIs.
- Create executive and stakeholder reports on platform adoption, usage trends, and cost allocation.
- Develop Responsible AI dashboards tracking hallucination rates, accuracy metrics, guardrail triggers, and safety incidents.
- Monitor API gateway traffic patterns and consumption trends.
- Provide regular reporting to product and engineering leadership.
Release Operations Support:
- Support release management processes with pre- and post-deployment validation checks.
- Track release health metrics for models, agents, and platform components.
- Maintain release documentation, runbooks, and operational playbooks.
- Coordinate with QA, Performance Engineering, and Infrastructure teams during releases.
Responsible AI Operations:
- Monitor guardrail effectiveness and escalate anomalies to Responsible AI teams.
- Track hallucination detection, content safety triggers, and accuracy trends.
- Support LLM red-teaming efforts by collecting and organizing evaluation data.
- Maintain audit logs and compliance documentation for AI governance.
Cross-Functional Coordination:
- Act as the operational point of contact for application teams using AI APIs.
- Coordinate with Security teams on audits and compliance reporting.
- Partner with Infrastructure teams on capacity planning and resource utilization.
- Support performance engineering with load test analysis and documentation.
Basic Qualifications:
- 2–4 years of experience in an operations, analytics, or technical operations role (MLOps, AIOps, DataOps, Platform Ops, or similar).
- Understanding of AI/ML concepts: models, inference, embeddings, vector databases, LLMs, tokens, and prompts.
- Experience with cloud cost management and FinOps practices.
- Strong proficiency with dashboarding/visualization tools (Looker, Tableau, Grafana, or similar).
- Working knowledge of GCP (required); familiarity with Azure and AWS is a plus.
- Experience with SQL and basic Python for analysis or scripting.
- Experience with monitoring/observability tools (Datadog, Prometheus, Grafana, Cloud Monitoring, etc.).
- Understanding of APIs and API gateways; ability to read logs and analyze traffic.
- Strong analytical, troubleshooting, and communication skills.
- Bachelor’s degree in computer science, BIS, MIS, Electrical Engineering, Mechanical Engineering, or related field.
Preferred Qualifications:
- Hands-on experience with LLM platforms such as Vertex AI, Azure OpenAI, or AWS Bedrock.
- Familiarity with AI agents and agentic frameworks (AutoGen, LangChain, etc.).
- Exposure to MCP (Model Context Protocol) or agent-tool integration patterns.
- Experience with vector databases and RAG operations.
- Understanding of the MLOps lifecycle: model registry, versioning, deployment, A/B testing.
- Experience with APIGEE or similar API management platforms.
- Familiarity with Responsible AI metrics (hallucination, bias, content safety, guardrails).
- FinOps certification or formal cloud cost management experience.
- Experience supporting enterprise AI platforms with multiple application teams.
Nice to Have:
- Familiarity with ML pipeline tools (Kubeflow, MLflow, Vertex AI Pipelines).
- Exposure to prompt management and evaluation frameworks.
- ITIL or similar operational process framework experience.
- Experience creating runbooks and operational documentation.
Get job alerts by email.
Sign up now!
Join Our Talent Network!