Our client, a world leader in finance and technology are expanding their engineering team with DevOps/SRE talent. You will be joining an award winning team and your work will have an immediate impact on the rest of the company.
The Cloud Stability group is trusted to support the private cloud infrastructure. This infrastructure runs on open-source OpenStack distribution based on OpenStack itself, Ubuntu, Chef, Ansible and Ceph. You'll be trusted to ensure high-availability and scalability of this environment.
What's In It For You:
You'll work with modern open-source tooling while maintaining mission-critical systems hosting a wide array of applications. You'll be relied on to advise on design, architecture, and scaling of our virtual farms that utilize several different technologies with different SLAs. In addition, you'll play a critical role in improving the stability of all cloud systems to help ensure there is a solid platform as we scale.
You'll Need to Have:
- Demonstrated experience programming and testing Python, Ruby, Go, or C/C++
- Experience working in a 24/7 production engineering organization
- Ability to listen, communicate, evaluate, problem solve, multi-task, and prioritize in a high-pressure, mission-critical, and rewarding team environment.
We'd Love to See:
- Deep expertise troubleshooting complex distributed systems
- Experience with creating and improving documented procedures and/or playbooks
- Working knowledge of Chef, Puppet, Ansible, or Salt
- Familiarity with open source configuration, orchestration, and CI/CD tools
- Deep understanding of TCP/IP and Unix networking, Linux kernel performance (virtual memory and process scheduling)
- Experience with Virtualization technologies such as Docker or VMWare
- Familiarity with large-scale x86 infrastructure with thousands of machines