Reliability Engineering Manager

See more jobs from Argo AI

over 5 years old

This job is no longer active

Argo AI was founded to tackle one of the most challenging applications in computer science, robotics and artificial intelligence with self-driving vehicles. Argo AI is developing and deploying the latest advancements in artificial intelligence, machine learning, and computer vision to help build safe and efficient self-driving vehicles that enable these transformations and more. The challenges are significant, but we are a team that believes in tackling hard, meaningful problems to improve the world. 

We are building a high-performance team that is excited by complex engineering challenges and is passionate about making transportation safer, more affordable, and accessible for all. Vehicle Operations’ essential function at Argo is to safely execute the test and road operations initiatives set forth by our engineering team. Vehicle Operations consists of a team of experts who understand Argo’s underlying vehicle technology and programs. The team then uses that information to effectively manage efforts to meet our mileage and performance goals. These programs will provide detailed feedback to our engineering team on how our vehicles are performing. 

The Operational Reliability Engineering Team is responsible for ensuring the Autonomous Vehicle and various Supporting Software Systems are functioning properly and at maximum efficiency. This includes working with our discipline and responsible engineers to ensure vehicle platform, hardware components, communication networks, data management systems, and software product suites are continually working together to deliver a high-performance Autonomous Vehicle. The Reliability Team is instrumental in triaging reported Production Software issues as well as maximizing fleet uptime of vehicles and supporting Hardware resources at all Argo AI Vehicle Operations locations. As issues arise during any of our operations, the Reliability Engineering Team will be responsible for diagnosing, documenting, tracking, and reporting a wide variety of both Software and Hardware issues through to resolution. 

What you’ll do: 

  • Manage the day to day effort and long term vision of the Reliability Engineering Team
    • Diagnosing issues being reporting from our Operations and Supporting Service Teams
    • Triaging and tracking the occurrences and frequency of complex Software and Hardware problems on rapidly evolving systems to understand the scope and impact
    • Determining Root Cause and proper scalable repairs or solutions 
    • Driving design improvements to Engineering Teams for future system improvements 
  • Design process and tooling that allows for an efficient flow of issues being reported from downstream customers
  • Will be responsible for development and implementation of policies and procedures to Operational Reliability and supporting Teams 
  • Design and generate reporting and metrics of Software and Hardware Issues, HW Resources Status, and Team performance
  • Work cross-functionally within Fleet Operations, Software Engineering, Hardware Engineering, and Test Operations Teams, serving them as downstream customers and collaborating to meet their requirements 
  • Enforce a safety first mindset, data driven approach to problem solving and utilization of careful communication across the team
  • Cultivate and maintain a healthy, diverse work culture, and environment with a strong focus on safety by administering regular briefings with remote employees, performing continuing education classes, administering performance reviews and leading by example

What we’re looking for:

  • Degree in Software Engineering, Robotics Engineering, Computer Engineering, Electrical Engineering, Physics or a related field
  • 4-6 years of management experience required
  • AV or Robotics operations, support, or relevant industry experience strongly preferred 
  • Able to motivate and lead a dynamic and cross-functional team
  • Strong desire for candidates with hands-on experience that have deployed real products or platforms into the real world and intimately understand the challenges of working with complex systems and systems at scale  
  • Excellent problem solving and troubleshooting skills, including a systematic approach to determining root cause, implementing solutions, and documenting results
  • Detail oriented with good organizational skills
  • Excellent communication skills with the ability to span a large and varied workforce.  
  • Strong desire to learn new skills related to technology and software
  • Proven self-starter mindset and the ability to work independently or with minimal supervision

Additional Job Requirements: Occasional travel to remote offices and test facilities

At Argo AI, we have a strong emphasis on creating a highly effective team environment. Thus, we seek candidates that can work effectively with others across a broad range of disciplines.

Argo AI is an equal opportunity employer that believes in diversity as a strength and is committed to creating an inclusive environment for all employees.

We know it takes competitive benefits to fuel a team that works hard and enjoys the challenge. At Argo AI, you can expect stellar perks to support your best self:

  • High-quality individual and family health, dental, and vision insurance
  • Competitive compensation packages
  • Employer-matched 401(k) retirement plan
  • Paid parental leave
  • Unlimited vacation
  • Daily catered lunches and snacks
  • Free onsite or adjacent parking
  • Commuter reimbursement
  • Fitness reimbursement
  • Professional development reimbursement

Argo AI is a LinkedIn Top 50 Startup