Bright Pattern is preparing to scale our operations for the future. We are looking for a candidate that can be a core contributing member of our team as we move towards a more agile and platform agnostic architecture that is based on public and private cloud technologies, automation frameworks, and data-driven configuration management where we treat infrastructure as code.
You will have an opportunity to define Bright Pattern next generation datacenter and cloud design, operational procedures, configuration management, automation, and other technologies. As a member of our Operations team, you will be responsible for managing all aspects of our secure Software as a Service (SaaS) infrastructure, platform, and application environments, as well as managing customer deployments. This position will work closely with our Engineering and Customer Success teams to deploy new application releases and integrate customers into our SaaS platform while developing processes, tools and operational metrics to improve system and process performance and increase scalability, functionality, and security.
Site Reliability Engineer Core Responsibilities:
- Research and develop designs and strategies to help meet our technology and automation goals, including Cloud and SDN architectures
- Design, scale, and maintain the Bright Pattern SaaS infrastructure
- Utilize instrumentation & metrics, and applications to automate and improve operational processes and availability, scaling, and security of the production and development environments
- Build, enhance, and maintain development, management and monitoring systems
- Collaborate with Engineering and Product teams to design and implement solutions to support Operations vision and strategy
- Deploy and maintain product releases and customer configurations
- Engage with industry and vendor partners to drive our requirements and product needs
- Participate in on-call rotation
Site Reliability Engineer Qualifications:
- Successful candidates will have a broad technical background, ability to develop scripts and tools, excellent problem solving and troubleshooting skills, and a strong commitment to delivering quality solutions.
- Bachelor’s degree in Computer Science or similar major or equivalent experience.
- 3+ years customer facing environment experience
- 3+ years operational experience managing critical tools and infrastructure with a strong focus on providing cloud-based services and technology.
- Expert level Linux system administration skills.
- Proficiency in BASH and Python
- 3+ years of experience with Cloud Technologies and Architecture (AWS, OpenStack, Open vSwitch, etc.), Cloud Orchestration/Automation Tools (TerraForm, Salt, etc.)
- Experience with tools for system, process, and environment monitoring (e.g. Sensu, Nagios/Icinga, Graphana (TIG), TSDB, Cacti), logging analysis (Logstash, Splunk, Elastic Search), and configuration management (e.g. Salt, Ansible, Puppet)
- Solid understanding of source code control systems; experience with Subversion (SVN) and Git
- Proficient in MySQL and MongoDB support (replication, grants, operational procedures)
- Familiarity with Java/JVM performance tuning
- Knowledgeable in security fundamentals, including encryption, OpenSSL, SSL Certificates, SELinux, system hardening, etc.
- Experience with network layer devices and functionality (e.g. Switching, Routing, Load Balancing, Proxying, NAT)
- Excellence at triage and troubleshooting skills
- Strong interpersonal and communication skills; ability to collaborate across teams and skill levels
- Experience developing, tracking and leveraging performance metrics for continual improvement
- Strong attention to detail and excellent documentation skills
This is a unique opportunity to work in a company who is the leader in its market and continues to invest in expanding its advantage. In addition, you’ll work in an exciting, fast-paced startup environment with experienced industry leaders.