Senior Site Reliability Engineer, Resiliency Engineering

about 4 years old

This job is no longer active

Atlassian is continuing to hire for all open roles with all interviewing and on-boarding done virtually due to COVID-19. Everyone new to the team, along with our current staff, will temporarily work from home until it is safe to return to our offices.

We are looking for a Resiliency / Chaos Engineering expert who is passionate about building chaos tools to perform different kinds of failure experiments across Atlassian Products, Services & Platforms. This team will explore capabilities around resilience practices and convert them into a framework to help products to easily create test harnesses for resiliency testing. We are scaling rapidly & can offer an open runway for the right person, with room to grow.

An ideal candidate is someone who is eager to learn new things, keeping on top of industry trends (particularly those related to resiliency/chaos engineering) and who loves to share their knowledge with others. Someone who thrives on working with a diverse set of partners, who can articulate the business impact of a problem and can also dive deep into the technical solution.

We'd love it if you brought a deep understanding of modern Cloud infrastructure, programming expertise, operational experience, and a desire to change the status quo. We'll support you with robust backend systems, mature processes, and a motivated team with a strong desire to not f*** the customer. We're looking for an engineer who can analyze and help improve our monitoring and processes to get us to an even higher level of availability, scalability, and reliability.

On your first day, we'll expect you to have:

7+ Years of expertise with software development, ideally Python/Java/Go/etc

Understanding of Linux and Networking systems

Experience driving large, complex, cross-organisational initiatives from inception to completion

Hands on experience with public cloud offerings (AWS components like EC2, CloudFormation, IAM, RDS, S3, DynamoDB, Kinesis - or equivalents, e.g. in GCP)

Experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, etc...

Strong organisational and interpersonal skills, with experience developing and instilling a culture of operational maturity

It would be great, but not mandatory, if you had:

An ability and desire to mentor and coach engineers

A deep understanding of Resilience engineering (chaos tools e.g. chaos monkey, latency monkey, chaos toolkit, etc.) best practices

Experience with front end development including React

Atlassian Site Reliability Engineering is a rapidly growing group within the organization. We are in the process of building our teams, tools, and systems as part of Atlassian's mission to build the best SaaS services in the world. This is a truly exciting team to join - we are currently or are planning to be involved with every technical team across Atlassian.

We enable Atlassian to go fast by providing real-time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values with a strong customer focus and possess a healthy sense of urgency. We are a heavily data-driven team, utilizing a variety of data collection, enrichment, analytics and visualizations to learn about our complex systems.

We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on, and bash out code to support the team, we have a spot for you too.

More about our benefits

Whether you work in an office or a distributed team, Atlassian is highly collaborative and yes, fun! To support you at work (and play) we offer some fantastic perks: ample time off to relax and recharge, flexible working options, five paid volunteer days a year for your favourite cause, an annual allowance to support your learning & growth, unique ShipIt days, a company paid trip after five years and lots more.

More about Atlassian

Creating software that empowers everyone from small startups to the who’s who of tech is why we’re here. We build tools like Jira, Confluence, Bitbucket, and Trello to help teams across the world become more nimble, creative, and aligned—collaboration is the heart of every product we dream of at Atlassian. From Amsterdam and Austin, to Sydney and San Francisco, we’re looking for people who want to write the future and who believe that we can accomplish so much more together than apart. At Atlassian, we’re committed to an environment where everyone has the autonomy and freedom to thrive, as well as the support of like-minded colleagues who are motivated by a common goal to: Unleash the potential of every team.

Additional Information

We believe that the unique contributions of all Atlassians is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.

All your information will be kept confidential according to EEO guidelines.