The Bridge Engineering team at getBridge is looking for an SRE to help us grow our product, scale our systems and empower our feature teams. Our systems run on Kubernetes, and we are hosted in AWS across multiple regions.
What We Do
Bridge is a tool that helps people find their place at work, form meaningful relationships with peers and managers, and forge a path towards growth. We’re helping our customers create work cultures people love.
Who We’re Looking For
- A problem solver who asks questions to get at the core issue that the team is grappling with before deciding on a solution.
- A pragmatist who knows how to make trade offs to solve challenges while building an architecture that scales for the future.
- A systematic thinker who can understand how the larger system operates and knows when to take a step back and consider alternative approaches.
- A team player who loves teaching and learning from others.
What We Offer
- Experience working on a highly available business-to-business (B2B) software as a service (SaaS) product with thousands of active customers.
- Competitive compensation package
- Flexible work environment
- Quarterly hack week events
What You’ll Be Doing
- Owning cloud operations for dozens of services in multiple regions, environments and language stacks.
- Optimizing systems for speed and reliability.
- Configuring observability systems to identify incidents before they happen.
- Discovering problem areas (i.e. slow response times, high resource saturation, etc) and working with service owners to resolve or mitigate them.
- Implementing automation to reduce toil and enable healthy systems by default.
- Working alongside a highly skilled SRE team running services in multiple Kubernetes clusters.
- Building tools and resources for upskilling other engineering teams to make service creation and maintenance self-service.
- Cost optimizing cloud operations.
- Responding to incidents and contributing to a continuous improvement culture with occasional participation in 24/7 on call rotations.
- Shaping data-related policies like backup cadence, retention policies, security best practices, disaster recovery plans, etc.
What You’ll Need
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.
- At least 5-8 years experience running production systems at scale as an SRE or senior engineer.
- Deep understanding of at least one modern programming language (Ruby, Go, Java, etc).
- Knowledge of cloud-based providers (AWS preferred, Azure, Google Cloud).
- Familiarity with cloud networking configuration (VPCs, security groups, load balancers, DNS, etc).
- Familiarity with system observability through monitoring and alerting (like Datadog, Sentry, etc.)
- Ability to work with a globally distributed team in multiple time zones.
- Experience with configuration-as-code tools such as Terraform.
- Experience with Kubernetes or other container orchestration systems.
- Experience with streaming data services like Pulsar, Kafka or Kinesis is a plus.
- Ability to speak fluent English