Software Developer – Production Engineering Watson Orders
Watson Orders is a IBM Silicon Valley based technology development group targeting the development of world-class conversational AI. Our mission is to deliver advanced technology solutions that address real-world, data driven needs in a customer-facing the quick service restaurant, environment. We are focused on using state-of-the-art Machine Learning, AI, and related technologies to completely transform the customer experience!
Your Role and Responsibilities
- We are currently looking for skilled Senior Software developer – Production Engineering to ensure performance and reliability for AI & ML driven voice agent microservices, Edge Kubernetes clusters, network services, and storage layers.Work closely with other Watson Orders development teams in an embedded SRE model to help define & implement key metrics for uptime, reliability, and performance of these services and develop runbooks for incident management.
- Develop deep service telemetry through metric collection, distributed tracing, visualization, and reporting via Open Telemetry, Prometheus, and related tooling.
- Implement stability and performance optimizations in Python.
- Participate in the definition and management of SLIs, SLOs and error budgets for infrastructure and production services.
- Design, develop and maintain CI\\CD pipelines for integration and edge Kubernetes clusters.
Required Technical and Professional Expertise
- 5+ Years Linux experience configuring, supporting, and optimizing
- 2+ Years experience architecting, deploying, and supporting edge k8s environments.
- 2+ years experience designing and supporting distributed systems.
- 2+ years Experience in one of more languages such as Python, Java, Go – ability to debug, optimize, and write scalable code.
Preferred Technical and Professional Expertise
- Experience implementing telemetry frameworks (Open Telemetry, prometheus) and infrastructure (Prometheus, Jaeger, and similar tools)
- Experience designing and implementing infrastructure as code pipelines
- Familiarity with AWS DevOps (Roles, VPCs, S3, Terraform)
- Familiarity running distributed ML workloads in cluster orchestrated environments
- 2+ Years PubSub Experience (Kafka, MQTT, SQS)
Your application has been successfully submitted.