Jobs / KellyMitchell Group

Site Reliability Engineer

KellyMitchell Group · California, United States
Visa: unknownSalary: unknownWork mode: unknown
Skills
awsci/cdkuberneteslinuxmysqlnginxnode.jspythonredissplunksre

Description

Job Summary:

Our client is seeking a Site Reliability Engineer to join their team! This position is located Remote California, Washington, and Florida.

Duties:

  • Design, build, and operate highly reliable, scalable, and secure production systems supporting commercial platforms
  • Apply SRE principles to improve system resiliency, availability, performance, and operational maturity
  • Implement and enhance observability solutions, including monitoring, logging, tracing, and telemetry
  • Lead and participate in on-call rotations, independently resolving moderately to highly complex production incidents
  • Diagnose and remediate system, application, and infrastructure performance bottlenecks
  • Implement and maintain security-focused reliability solutions, including bot mitigation and threat protection
  • Configure, tune, and optimize web platforms, containerized services, and distributed systems
  • Develop automation and tooling to reduce operational toil and improve incident response efficiency
  • Evaluate new application and infrastructure requirements for capacity, performance, reliability, and runtime best practices
  • Assess new technologies and platforms for technical feasibility, alignment with standards, and operational readiness
  • Author, document, and teach troubleshooting methodologies, operational standards, and best practices to the SRE team
  • Collaborate closely with application, platform, security, and infrastructure engineers to deliver resilient solutions


Desired Skills/Experience:

  • Senior-level experience as a Site Reliability Engineer, Production Engineer, or Security-focused SRE
  • Strong programming and scripting skills in Python and/or Java (ability to build automated tooling and test coverage)
  • Hands-on experience with Akamai Kona Site Defender and bot mitigation strategies
  • Proven experience in security engineering, particularly web application protection and threat mitigation
  • Strong observability and monitoring experience using tools such as Splunk or similar platforms
  • Experience working with distributed systems and container platforms such as: Kubernetes, ECS, Fargate, and GKE
  • Deep understanding of Linux and Windows systems administration, including performance monitoring and troubleshooting
  • Expertise with networking fundamentals and protocols
  • Experience with CI/CD pipelines and automation tools
  • Strong experience with cloud platforms, AWS preferred
  • Proficiency with web server technologies such as: Nginx, Apache, Tomcat, Node.js including performance tuning and debugging
  • Experience with data platforms such as MySQL, NoSQL, Redis, Elastic, including basic configuration and troubleshooting
  • Exceptional troubleshooting skills with a structured, methodical approach to incident resolution
  • Ability to quickly understand application behavior, traffic patterns, and security threats in production environments
  • Strong background in observability strategy design and implementation
  • Experience supporting high-traffic, customer-facing digital platforms
  • Familiarity with large-scale enterprise or consumer environments
  • Experience mentoring other engineers in SRE, incident response, and reliability best practices


Benefits:

  • Medical, Dental, & Vision Insurance Plans
  • Employee-Owned Profit Sharing (ESOP)
  • 401K offered


The approximate pay range for this position is between $46.00 and $67.00. Please note that the pay range provided is a good faith estimate. Final compensation may vary based on factors including but not limited to background, knowledge, skills, and location. We comply with local wage minimums.

Get new job alerts Weekly digest to your inbox.