- At Bright Pattern we design and develop cloud-based communication-enabled customer service and sales applications. A few words about the software we’ve developed, and are now running:
- Fault tolerant
- Runs on bare metal as well as in virtualized environments
- Runs in 24 x 7 environment and is upgradable without downtime
- 3 (4) levels of data protection
- Carries mission-critical voice traffic
- Works with mobile apps
- Integrates with major SaaS providers (Salesforce, Zendesk etc)
- Highest hardware/user density in the industry
- Disaster recovery support (currently primary/backup, active-active is being developed)
- The whole system can run on a laptop if needed (surely you’ll need 2 for redundancy)
- We are at the phase when we have the product and it is competitive and it works and we are beginning to scale. 3 large companies use our software to provide white label service and they look at us to provide guidance on best practices. Interested? Sign up.
You will be responsible for operations infrastructure in a data center (production, staging clusters), track available capacity, propose and materialize expansion plans. As a part of it, you will have to:
- Ensure proper monitoring of the solution deployed in the datacenter (servers, network, VoIP). The primary focus is on productization and improvements of monitoring system to ensure consistent, repeatable coverage:
- Disk space, RAM, CPU, RAID health
- TCP Network management (latencies, packet loss, connectivity to specific network routes)
- VoIP network management (Jitter, latencies, voice quality)
- PSTN network availability (inbound/outbound traffic)
- Database health (server load trends, transaction rate, runaway queries)
- Server health (disk health, RAM health, temperature etc)
- Establish, document and maintain operational procedures on failure management, monitoring, preventative maintenance, expansions, and upgrades. We expect you to try them manually as the first step, document them as next step and automate (think Ansible) as the final step.
- Maintain and improve automation for installations, multi-computer cluster upgrades
- Perform expansion, provisioning of additional servers, perform pilot deployments in new environments (AWS, Azure, across geographic regions)
- Help our support people with customer issue resolution. Typical issues:
- voice quality
- customer misconfigurations (but need to check server side logs to pinpoint)
- server configuration issues
- Familiarity (and affection) with Unix
- Unix shell scripting
- Configuration Management Systems (chef, puppet, ansible)
- Good understanding of TCP/IP networking.
- Familiarity with SIP protocol
- Experience with VoIP as technology in general
- MySQL (on the level of basic queries). Experience with performance tuning and load
- analysis is desired. Clustering experience is helpful.
- NoSQL – Mongo DB – the more you know, the better.
- Cisco router configuration
- Charting, trend analysis (Grafana, time-series databases)
- Practical exposure to systems management systems
- Customer oriented, ready/willing to troubleshoot customer issues, desire to help
- Desire and ability to learn the system/product/market space
- Ability and willingness to communicate externally with vendors/partners to solve issues
- Ability and desire to communicate internally within the company across department boundaries to solve issues
The focus of the ideal candidate will be on automation of monitoring and repetitive tasks, and documenting all remaining processes and procedures, so they could be shared with other people.
We offer plenty of challenges and opportunities to explore and try and use new technologies – from new and coming charting, trending, monitoring solutions to advanced usage of AWS services.
IT Operations Engineer was last modified: July 26th, 2017 by