Job Description:
C3.ai, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at C3 AI
We are looking for a Site Reliability Engineer to join our team in Tysons, VA, and Redwood City, CA.
Responsibilities
- Maximize system uptime and availability, ensuring functional and performance SLAs.
- Establish end-to-end monitoring and alerting on all critical aspects.
- Solve complex problems for critical services and build automation to prevent problem recurrence.
- Influence and create new designs, architectures, standards, and methods for supporting the platform.
- Initiate and lead scripting and automation to streamline system updates and upgrades.
- Set up critical infrastructure, tools, and framework to streamline the deployment cycle.
Qualifications
- Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.
- Expertise in Linux Operating Systems, Networking, and Database concepts.
- Experience with Cassandra (or another NoSQL alternative).
- Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.
- Experience with configuration management systems such as Ansible or Terraform.
- Experience in Ruby or Python; to automate and monitor systems.
- Excellent problem-solving, critical thinking, and communication skills.
- Experience supporting as a DevOps or sys admin for commercial SaaS solutions.
- BS or MS in Computer Science, related field, or equivalent professional experience.