Based in sunny Los Angeles, Ritual is a direct-to-consumer health brand that believes it’s crucial to know not just what you’re putting into your body, but why you need it in the first place.
We’re transparent about our ingredients, where they're sourced from, and have spent years conducting extensive research behind each of the premium nutrient forms we use. Our products are simple, effective, and backed by science. Our mission is simple: to empower people to feel their best by turning healthy habits into a ritual.
We have built a team of curious skeptics, world-class scientists, unconventional artists, expert marketers and analytical strategists that are on a mission to reinvent an entire industry. We’re well-funded, growing quickly, and committed to our mission. If you’re a team player who refuses to settle for the status quo, we want you. Welcome to your new daily Ritual.
Position: Site Reliability Engineer
Reports to: Director of Engineering, Platform
As a Site Reliability Engineer at Ritual you will play a pivotal role in scaling our services and processes to meet rapidly growing customer demand. A collaborator by nature, you will partner with support and product engineers, empowering them to do their best work. Your contributions will help to define quality in software at Ritual.
What You’ll Do:
- Be responsible for the day to day monitoring of availability, latency and overall system health as well as administration, and operations of Ritual’s devices and applications
- Work to automate and improve the systems and processes that support our services
- Drive organization-wide best practices for monitoring, alerting, and sustainable incident response and postmortems
- Collaborate with support and product teams to resolve production and customer issues and incidents, debugging across all levels of the stack
- Plan for and manage the maintenance of Ritual’s infrastructurePlan for linearly increasing and seasonal spikes in demand on Ritual’s services
- Support features before they launch by consulting on system design, assisting with capacity planning and helping teams understand edge cases and failure scenarios
- Evolve systems as an advocate for changes that improve reliability and development velocity
Who You Are:
- Collaborator. You see software as a team sport - delight in working across teams with people with diverse skill sets and backgrounds. You enjoy learning from and teaching your peers. You approach all interactions with an assumption of the best intent.
- Communicator. You write well, understand the power of clear and concise prose but always prefer clarity over brevity.
- Analytical. You’re advised by data. You measure what matters and leverage key metrics as the catalyst for process and infrastructure change.
- Driven. When you see something that can be improved you can’t help but want to fix it. You take initiative and err on the side of action.
What You Need:
- Education: A bachelor’s degree in computer science or a related field or equivalent professional experience.
- Experience: 2-3+ years of experience as an SRE in a small team environment
- Experience operating managed public services and knowing how to think about monitoring, automation, day to day administration and incident management is critical
- A desire to ship many times a day
- Experience with at least one cloud platform, such as AWS
- Experience with Heroku or similar PaaS offerings
- Familiarity with Docker
- Experience with scripting and automation
- Experience with monitoring, alerting, and operations using tools like New Relic, Datadog, and Opsgenie / PagerDuty
- Experience managing the incident response process
- An ability to debug complex problems across the whole stack
- A keen focus on the needs of users, both internal and external