The Bridge Engineering team at getBridge is looking for an SRE to help us grow our product, scale our systems and empower our feature teams. Our systems run on Kubernetes, and we are hosted in AWS across multiple regions.
What We Do
With Bridge, you can boost engagement by helping employees connect with their peers. You can improve performance & productivity by helping employees stay in sync with their managers. We believe that upskilling your people by giving them constant opportunities to learn and grow is the key to creating a work culture people love. Bye-bye skills gap! Hello to better results.
Who We’re Looking For
- A problem-solver who asks questions to get at the core issue that the team is grappling with before deciding on a solution.
- A pragmatist who knows how to make trade-offs to solve challenges while building an architecture that scales for the future.
- A systematic thinker who can understand how the larger system operates and knows when to take a step back and consider alternative approaches.
- A team player who loves teaching and learning from others.
What We Offer
- Experience working on a highly available business-to-business (B2B) software as a service (SaaS) product with thousands of active customers.
- Competitive compensation package
- Flexible work environment
- Quarterly hack week events
What You’ll Be Doing
- Owning cloud operations for dozens of services in multiple regions, environments, and language stacks.
- Optimizing systems for speed and reliability.
- Configuring observability systems to identify incidents before they happen.
- Discovering problem areas (i.e. slow response times, high resource saturation, etc) and working with service owners to resolve or mitigate them.
- Implementing automation to reduce toil and enable healthy systems by default.
- Working alongside a highly skilled SRE team running services in multiple Kubernetes clusters.
- Building tools and resources for upskilling other engineering teams to make service creation and maintenance self-service.
- Cost optimizing cloud operations.
- Responding to incidents and contributing to a continuous improvement culture with occasional participation in on-call rotations.
- Shaping data-related policies like backup cadence, retention policies, security best practices, disaster recovery plans, etc.
What You’ll Need
- At least 2-8 years experience running production systems at scale as an SRE or senior engineer.
- Experience running production systems on Kubernetes.
- Experience with configuration-as-code tools such as Terraform or Pulumi.
- Deep understanding of at least one modern programming language (Python, Go, Java, etc).
- Knowledge of cloud-based providers (AWS preferred, Azure, Google Cloud).
- Familiarity with cloud networking configuration (VPCs, security groups, load balancers, DNS, etc).
- Familiarity with system observability through monitoring and alerting (like Datadog, Sentry, etc.)
- Ability to work with a globally distributed team in multiple time zones.
- Ability to speak fluent English
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.