Jobs / KellyMitchell Group
Site Reliability Engineer
KellyMitchell Group · California, United States
Visa: unknownSalary: unknownWork mode: unknown
Skills
awsci/cdkuberneteslinuxmysqlnginxnode.jspythonredissplunksre
Description
Job Summary:
Our client is seeking a Site Reliability Engineer to join their team! This position is located Remote California, Washington, and Florida.
Duties:
- Design, build, and operate highly reliable, scalable, and secure production systems supporting commercial platforms
- Apply SRE principles to improve system resiliency, availability, performance, and operational maturity
- Implement and enhance observability solutions, including monitoring, logging, tracing, and telemetry
- Lead and participate in on-call rotations, independently resolving moderately to highly complex production incidents
- Diagnose and remediate system, application, and infrastructure performance bottlenecks
- Implement and maintain security-focused reliability solutions, including bot mitigation and threat protection
- Configure, tune, and optimize web platforms, containerized services, and distributed systems
- Develop automation and tooling to reduce operational toil and improve incident response efficiency
- Evaluate new application and infrastructure requirements for capacity, performance, reliability, and runtime best practices
- Assess new technologies and platforms for technical feasibility, alignment with standards, and operational readiness
- Author, document, and teach troubleshooting methodologies, operational standards, and best practices to the SRE team
- Collaborate closely with application, platform, security, and infrastructure engineers to deliver resilient solutions
Desired Skills/Experience:
- Senior-level experience as a Site Reliability Engineer, Production Engineer, or Security-focused SRE
- Strong programming and scripting skills in Python and/or Java (ability to build automated tooling and test coverage)
- Hands-on experience with Akamai Kona Site Defender and bot mitigation strategies
- Proven experience in security engineering, particularly web application protection and threat mitigation
- Strong observability and monitoring experience using tools such as Splunk or similar platforms
- Experience working with distributed systems and container platforms such as: Kubernetes, ECS, Fargate, and GKE
- Deep understanding of Linux and Windows systems administration, including performance monitoring and troubleshooting
- Expertise with networking fundamentals and protocols
- Experience with CI/CD pipelines and automation tools
- Strong experience with cloud platforms, AWS preferred
- Proficiency with web server technologies such as: Nginx, Apache, Tomcat, Node.js including performance tuning and debugging
- Experience with data platforms such as MySQL, NoSQL, Redis, Elastic, including basic configuration and troubleshooting
- Exceptional troubleshooting skills with a structured, methodical approach to incident resolution
- Ability to quickly understand application behavior, traffic patterns, and security threats in production environments
- Strong background in observability strategy design and implementation
- Experience supporting high-traffic, customer-facing digital platforms
- Familiarity with large-scale enterprise or consumer environments
- Experience mentoring other engineers in SRE, incident response, and reliability best practices
Benefits:
- Medical, Dental, & Vision Insurance Plans
- Employee-Owned Profit Sharing (ESOP)
- 401K offered
The approximate pay range for this position is between $46.00 and $67.00. Please note that the pay range provided is a good faith estimate. Final compensation may vary based on factors including but not limited to background, knowledge, skills, and location. We comply with local wage minimums.