Principal Infrastructure Site Reliability Engineer (San Jose, CA) (Remote Eligible)

See more jobs from Okta Inc

over 3 years old

This job is no longer active

We are looking for an experienced Principal Site Reliability Engineer to join our Technical Operations team. At Okta, we are "Always On." The core of that starts with this team, ensuring that customers never worry about the Okta service. They strive to build the most reliable and performant systems on the planet. 

We are looking for a Principal engineer who has experience and a passion for designing and running complex large scale services with any or multiple public cloud platforms. This role requires collaboration with the Okta Software Engineering and Site Reliability Engineering teams to ensure we are providing solutions to improve their productivity to build, manage and run their team’s services on the Okta infrastructure with high availability, reliability and performance.  The ideal candidate is someone that welcomes the challenge and enjoys seeing their designs run at scale with automation, testing, and tuning. If you exemplify the ethics of, "If you have to do something more than once, automate it," we want to hear from you!

What You'll Do:

  • Execute on initiatives to build Okta's production infrastructure with a focus on automation and scale for multiple public clouds
  • Promote and apply best practices for building scalable and reliable services across the team
  • Be a subject matter expert with public cloud infrastructure and how Okta services can run on them efficiently and at scale
  • Design, build, run and monitor Okta's production infrastructure
  • Drive initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices
  • Respond to production incidents and determining how we can prevent them in the future
  • Identify and automate manual processes
  • Develop and deliver solutions that serve as a model for others with regard to execution, quality, scalability, operability, maintainability, etc
  • Communicate and collaborate across levels, functions and engineering teams
  • Mentor and coach junior engineers to leverage their full potential

Qualifications for the role:

  • Track record of leading successful large scale Infrastructure projects
  • 8+ years of experience with designing and running large scale solutions on public cloud
  • 2+ years of experience with Docker, Kubernetes or cloud managed Kubernetes, Service Mesh
  • Possess knowledge in network and edge technologies
  • Demonstrate strong Linux fundamentals
  • 3+ years of experience with automating systems and infrastructure via Terraform
  • Experience automating and running large scale production services in public cloud providers
  • Can code to a good standard with a programming language using standard software development practices like unit testing and iterative development
  • Experience working with Agile methodologies 
  • Champion excellent documentation and communication skills, with the ability to influence others

Education and Training:

  • BS. Computer Science (plus) or relevant experience

Okta is rethinking the traditional work environment, providing our employees with the flexibility to be their most creative and successful versions of themselves, no matter where the employees located.  We enable a flexible approach to work, meaning you can work from the office or home, regardless of where you live.  Okta invests in the best technologies, and provides flexible benefits and collaborative work environments/experiences, empowering employees to work productively in a setting that best and uniquely suits their needs.  Find your place at Okta https://www.okta.com/company/careers/.

Okta is an equal opportunity employer.

#LI-RA1