Principal Engineer, Infrastructure

See more jobs from Lyft Inc.

about 4 years old

This job is no longer active

At Lyft, our mission is to improve people’s lives with the world’s best transportation. To do this, we start with our own community by creating an open, inclusive, and diverse organization.  

The Principal Engineer, Infrastructure will help ensure Lyft’s systems are healthy and reliable while working in a collaborative environment.  Passengers rely on Lyft to get to work, to go to the doctor, or to get home safely when public transit has stopped running. Drivers use Lyft for income and flexibility. Building a stable and reliable application for our passengers and drivers is a responsibility we take very seriously.

Software Engineers in Reliability Software Engineering (rSWE) organization work on standardizing and supporting rapidly growing product and infrastructure teams, assessing their architecture, helping them design scalable services, and fostering excellent operational practices. It's a critical role in ensuring that our systems are always healthy, monitored, automated, and designed to scale to the next level. We are a team of engineers who care about customer's perception of the reliability of our systems.

You will be embed with experienced engineering teams to surface system weaknesses, develop solutions, and evangelize standardization. As an early member of this organization, you will be helping in building its future vision and will be key to its success. We regard our culture and trust highly and believe you will add positively to it in your own way.

Responsibilities:

  • Drive explicit expectations and build holistic visibility of service reliability using SLIs, SLOs, and SLAs across Lyft’s services.
  • Write and review code, develop documentation and capacity plans, and debug the hardest problems anywhere on the stack.
  • Build infrastructure and drive projects that break things with the aim to improve the production systems.
  • Step back to observe patterns and develop innovative tools and automation to minimize toil.
  • Use those insights to drive the best operational practices.
  • Partner with the broader Lyft organization to build the culture of rigorously learning from incidents.
  • Mentor other RSWEs to improve our organization

Experience:

  • A minimum of 5 years of experience handling services in a large scale environment.
  • Extensive programming experience in Python or Go.
  • A real passion for building tools to make infrastructure more robust.
  • Experience working with public cloud platforms, specifically AWS.
  • Experience designing, debugging and running fault tolerant large-scale distributed systems.
  • Experience solving large scale reliability issues in a microservices supporting service mesh and Kubernetes clusters

Benefits:

  • Great medical, dental, and vision insurance options
  • In addition to 11 observed holidays, salaried team members have unlimited paid time off, hourly team members have 15 days paid time off
  • 401(k) plan to help save for your future
  • 18 weeks of paid parental leave. Biological, adoptive, and foster parents are all eligible
  • Pre-tax commuter benefits
  • Lyft Pink - Lyft team members get an exclusive opportunity to test new benefits of our Ridership Program

Lyft is an Equal Employment Opportunity employer that proudly pursues and hires a diverse workforce. Lyft does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender-identity, sexual orientation, disability, age, military or veteran status, or any other basis protected by applicable local, state, or federal laws or prohibited by Company policy. Lyft also strives for a healthy and safe workplace and strictly prohibits harassment of any kind. Pursuant to the San Francisco Fair Chance Ordinance and other similar state laws and local ordinances, and its internal policy, Lyft will also consider for employment qualified applicants with arrest and conviction records.