Cloud DevOps/Site Reliability Engineer

Bright Pattern is a global provider of cloud-based contact center software that helps businesses to organize work, automate frequent responses, and stay in context of the customer inquiries from contact to contact, irrespective of the contact channel. Our software is handling millions of our customers’ conversations every month, so the individual impact of each of our employees’ contributions is huge.

We are looking for a candidate that can be a core contributing member of our team as we move towards a more agile and platform agnostic architecture that is based on public and private cloud technologies, automation frameworks, and data-driven configuration management where we treat infrastructure as code.

You will have an opportunity to define Bright Pattern next generation datacenter and cloud design, operational procedures, configuration management, automation, and other technologies. As a member of our Operations team, you will be responsible for managing all aspects of our secure Software as a Service (SaaS) infrastructure, platform, and application environments, as well as managing customer deployments. This position will work closely with our Engineering and Customer Success teams to deploy new application releases and integrate customers into our SaaS platform. while maintaining and improving system and process performance and increasing scalability, functionality, and security.

This is a key position within our company and will report directly to the Director of Operations. The job can be performed in our South San Francisco or Pleasanton office.

Core Responsibilities:

  • Research and develop designs and strategies to help meet our technology and automation goals, including Cloud and SDN architectures
  • Design, scale, and maintain the Bright Pattern SaaS infrastructure
  • Use modern development languages such as Python, Javascript, Perl, and C++ to develop tools, scripts, and frameworks that drive efficiency in automation, monitoring, and management of our large-scale environments
  • Utilize instrumentation & metrics, and applications to automate and improve operational processes and availability, scaling, and security of the production and development environments
  • Build, enhance, and maintain development, management and monitoring systems
  • Collaborate with Engineering and Product teams to design and implement solutions to support Operations vision and strategy
  • Deploy and maintain product releases and customer configurations
  • Engage with industry and vendor partners to drive our requirements and product needs
  • Participate in on-call rotation

Key Qualifications:

  • Successful candidates will have a broad technical background, ability to develop scripts and tools, excellent problem solving and troubleshooting skills, and a strong commitment to delivering quality solutions.
  • Bachelor’s degree in Computer Science or similar major or equivalent experience.
  • 3+ years customer facing environment experience
  • 3+ years operational experience managing critical tools and infrastructure with a strong focus on providing cloud-based services and technology.
  • Expert level Linux system administration skills.
  • Proficiency in BASH and Python
  • 3+ years of experience with Cloud Technologies and Architecture (AWS, OpenStack, Open vSwitch, etc.), Cloud Orchestration/Automation Tools (TerraForm, Salt, etc.)
  • Experience with tools for system, process, and environment monitoring (e.g. Sensu, Nagios/Icinga, Grafana (TIG), TSDB, Cacti), logging analysis (Logstash, Splunk, Elastic Search), and configuration management (e.g. Salt, Ansible, Puppet)
  • Solid understanding of source code control systems; experience with Subversion (SVN) and Git
  • Proficient in MySQL and MongoDB support (replication, grants, operational procedures)
  • Familiarity with Java/JVM performance tuning
  • Knowledgeable in security fundamentals, including encryption, OpenSSL, SSL Certificates, SELinux, system hardening, etc.
  • Experience with network layer devices and functionality (e.g. Switching, Routing, Load Balancing, Proxying, NAT)
  • Excellence at triage and troubleshooting skills
  • Strong interpersonal and communication skills; ability to collaborate across teams and skill levels
  • Experience developing, tracking and leveraging performance metrics for continual improvement
  • Strong attention to detail and excellent documentation skills

This is a unique opportunity to work in a company who is the leader in its market and continues to invest in expanding its advantage. In addition, you’ll work in an exciting, fast-paced startup environment with experienced industry leaders.

Interested? Sign up!