Senior Data Infrastructure Engineer – Watson Orders
Software Developers at IBM are the backbone of our strategic initiatives to design, code, test, and provide industry-leading solutions that make the world run today – planes and trains take off on time, bank transactions complete in the blink of an eye and the world remains safe because of the work our software developers do. Whether you are working on projects internally or for a client, software development is critical to the success of IBM and our clients worldwide. At IBM, you will use the latest software development tools, techniques and approaches and work with leading minds in the industry to build solutions you can be proud of.
Your Role and Responsibilities
We are currently seeking a talented data infrastructure engineer focused on infrastructure powering transformative AI/ML products reaching tens of millions of customers per day, feeding billions of customers worldwide. The department covers data infrastructure, data pipelines, analysis, and performance optimization.The ideal candidate has experience architecting, developing, and supporting large-scale data infrastructure with a focus on resilience, scalability, and performance within a fast-growing, agile environment
• Architect and maintain the petabyte scale data lake, warehouse, pipelines, and query layers.
• Architect, implement, and support multi-region data ingestion system from geographically distributed edge AI systems.
• Develop and support AI research pipelines, training and evaluation pipelines, audio re-encoding and scanning pipelines, and various analysis outputs for business users
• Use pipelines to manage resilient idempotent coordination with external databases, APIs, and systems
• Work with AI Speech and Audio engineers to support and co-develop heterogenous pipelines over large flows of conversation AI data to support and accelerate experimentation with new AI models and improvements
Required Technical and Professional Expertise
- 5+ Years Professional Python Experience.
- 2+ Years PubSub Experience (Kafka, Kinesis, SQS, MQTT, etc).
- 3+ Years working in petabyte scale data platforms.
- 3+ Years working in AWS.
- Experience building robust schema-based parsers using standard tooling in Python.
- Experience developing with Apache Avro, Parquet Schemas, SQLAlchemy (or similar ORMs), and pySpark in Python
- Fluent in English
Preferred Technical and Professional Expertise
- Professional experience with conversational AI (chatbots, virtual assistants, etc.)
- Graduate degree (MA/MS, PhD) in Computer Science, Linguistics, or related field.
- Professional experience with Linux.
- Fluent or native proficiency in Spanish
Your application has been successfully submitted.