Articulate is looking for a Site Reliability Engineer II to join our growing Platform team! As a SRE II, you'll be focused on the execution of your team’s work in infrastructure, reliability. You'll be responsible for delivery and use your responsibilities to grow skills and build expertise in our technology, processes, and culture.
What You'll Do:
- Be an example of the best practices your team and adjacent teams should follow when building and maintaining our infrastructure.
- Learn how to orchestrate and maintain our infrastructure with Terraform, Github Actions, Kubernetes and more in AWS.
- Actively participate in pairing sessions with senior engineers to grow your skill set.
- Consistently deliver highly scalable, resilient, and cost-effective infrastructure solutions for our customers to use.
- Take a proactive approach to problem-solving (driving for measurable results, be an example of our engineering values, etc).
- Collaborate on project milestones and participate in breaking down initiatives into iterative work items while taking ownership of small to medium task generation and task management.
- Ability to communicate and collaborate effectively with members of a team by being highly engaged in all aspects of our team and responsibilities.
- Participate in our on-call rotation and contribute to incident reviews.
- Collaborate with the team to work through new ideas, brainstorming solutions, and aligning with platform standards.
- Work with others on the team to perform the necessary testing required to ensure that our infrastructure and supporting systems are performing to industry standards and meet the quality level our customers expect.
- Ensure timely execution of technical project work against the expected milestones while working as part of our cycle planning process.
- Ability to work with a sense of urgency to find solutions to problems quickly with an iterative approach.
- Is a nimble learner whereby they view mistakes as opportunities to learn, enjoy the challenge of unfamiliar tasks, and seek new approaches to solve problems.
- Is a collaborator whereby they facilitate an open dialogue with a wide variety of contributors and stakeholders, balance their own interests with others’ and promote high visibility of shared contributions to goals.
What You Should Have:
- 2+ years experience as a Site Reliability Engineer II or equivalent role.
- Hands-on experience Go, Javascript or similar language.
- sHands-on experience with container technologies such as Docker.
- Demonstrated hands-on experience with Go, Python, or similar languages.
- Experience using Terraform or similar IaC technologies.
- Experience with container orchestration platforms such as Kubernetes or ECS.
- Experience with sharing knowledge across teams to create a self-serving platform.
- Experience implementing and managing monitoring solutions; setting up comprehensive alerts to ensure quick response to any performance issues or downtime.
- Knowledgeable in using KPIs to drive observability, monitoring, and alerting to better serve ourselves and our customers.
- Demonstrated focus on iterative development practices and incrementally delivering value over time.
Nice to haves:
- Experience configuring and managing environments in AWS.
- Experience developing and maintaining Helm charts for deployment manifests in Kubernetes.
- Familiarity optimizing infrastructure for AI/ML workloads.
- Familiarity with GitHub Actions and CI/CD deployment practices.