Salary $140 - 160k CAD | $180 - 200k USD + Equity Packages
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Their SRE ensures that the company's services—both our internally critical and our externally-visible systems—have reliability and appropriate uptime. Additionally SRE’s will ensure their systems are built to capacity and optimal performance. Much of their software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation.
- Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
Here are some skills you should have:
- Experience designing large-scale distributed systems
- Experience designing and developing software oriented towards systems automation
- Ability to debug, optimize code, and automate routine tasks
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive
- Experience building scalable systems using tools designed to deal with infrastructure as code
- BS/MS in equivalent experience in Computer Science, Computer Engineering, Mathematics, Physics, or related field
- 5+ years of relevant experience in software design, including debugging, performance analysis, and testing designs
- 5+ years of experience in building large scale data infrastructure and pipelines