Software Developer – Production Engineering Watson Orders

IBM

Raleigh, NC, USA Remote

Full-time

Software Development

Jan 25

Introduction

Watson Orders is a IBM Silicon Valley based technology development group targeting the development of world-class conversational AI. Our mission is to deliver advanced technology solutions that address real-world, data driven needs in a customer-facing the quick service restaurant, environment. We are focused on using state-of-the-art Machine Learning, AI, and related technologies to completely transform the customer experience!


Your Role and Responsibilities

  • We are currently looking for skilled Senior Software developer – Production Engineering to ensure performance and reliability for AI & ML driven voice agent microservices, Edge Kubernetes clusters, network services, and storage layers.Work closely with other Watson Orders development teams in an embedded SRE model to help define & implement key metrics for uptime, reliability, and performance of these services and develop runbooks for incident management.
  • Develop deep service telemetry through metric collection, distributed tracing, visualization, and reporting via Open Telemetry, Prometheus, and related tooling.
  • Implement stability and performance optimizations in Python.
  • Participate in the definition and management of SLIs, SLOs and error budgets for infrastructure and production services.
  • Design, develop and maintain CI\\CD pipelines for integration and edge Kubernetes clusters.



Required Technical and Professional Expertise

  • 5+ Years Linux experience configuring, supporting, and optimizing
  • 2+ Years experience architecting, deploying, and supporting edge k8s environments.
  • 2+ years experience designing and supporting distributed systems.
  • 2+ years Experience in one of more languages such as Python, Java, Go – ability to debug, optimize, and write scalable code.


Preferred Technical and Professional Expertise

  • Experience implementing telemetry frameworks (Open Telemetry, prometheus) and infrastructure (Prometheus, Jaeger, and similar tools)
  • Experience designing and implementing infrastructure as code pipelines
  • Familiarity with AWS DevOps (Roles, VPCs, S3, Terraform)
  • Familiarity running distributed ML workloads in cluster orchestrated environments
  • 2+ Years PubSub Experience (Kafka, MQTT, SQS)


Apply for this position Back to job

You must be logged in to to apply to this job.

Apply

Your application has been successfully submitted.

Please fix the errors below and resubmit.

Something went wrong. Please try again later or contact us.

Personal Information

Profile

View CV/resume

Details

{{notification.msg}}