WNA-LTD.com Sr. Site Reliability Engineer in St. Louis, Missouri
Are you a site reliability rock star, skilled at engineering and testing global systems to mitigate vulnerabilities and risks of impact to customer experience? Can you surface ?the? big IT and networking production environment issues across a complex mix of legacy and cloud-based applications, enterprise packages, and network services? Are you up to the challenge of creating best-in-class production resiliency? Are you interested in joining a small but growing consulting firm that drives innovation, acceleration and benefits realization for our Fortune 100 clients?
WNA is seeking a full-time Senior Site Reliability Engineer (SRE) to join our growing national consulting firm. They will be instrumental in designing and delivering world-class SRE solutions for Fortune 100 telecommunications and fintech service provider clients.
The ideal candidate will be able to spearhead a QA organization?s drive to improve resiliency, reliability, collaboration and innovation across production operations?from both process and technical perspectives. The candidate would be responsible for identifying and mitigating risks, building resilient future proof state-of-art processes and automated systems, and driving value for teams across the organization.
Work with IT leaders, engineers, and subject matter experts to identify and execute strategies to improve production platform performance and fault tolerance. Work with the business to understand and prioritize feature and function priorities that deliver the right customer experiences with right-scaled reliability.
Balance handling operational risks, escalation, and issue management with optimizing internal processes and designing and developing monitoring solutions. Leverage the use of chaos engineering tools, techniques, and automation. Includes developing algorithms and methods to monitor applications and platform reliability in addition to developing injection experiments that identify vulnerabilities.
Design, develop and implement strategies and solutions that close risk exposures and hedge against failures. Continually review and measure the performance of prioritized applications, workloads, and infrastructures.
Includes creating and maintaining dashboard tools, documentation of mitigations and processes, and acting as a liaison between the IT operations and software development and delivery teams. Optimize internal processes, organize stakeholder meetings, suggest trade-offs, and articulate mitigation strategies to senior leadership.
Provide guidance to other SREs on devising effective and efficient approaches to profile and baseline systems, drive automation into platforms, and designing and implementing the tools that help developers perform their jobs better.
Contribute to the overall issue resolution, planning and engineering approaches, identification of resources, and development of timelines. Effectively communicate issues and alternative solutions. Take the initiative to achieve value-added results.
Background in designing, implementing, instrumenting, and supporting reliability engineering for complex provisioning and activation systems.
Strong knowledge in the instrumentation required for monitoring failures, chaos engineering?including use of techniques such canary testing, vulnerability testing. Development of associated toolsets, data systems, build and delivery systems, and runtime services and libraries to support legacy as well as microservice-based production platforms.
Strong knowledge of Incident Management processes, tools, and frameworks. Ability to collaborate on process design. Ability to develop triage and escalation processes and procedures for highly complex systems.
Strong knowledge of design and tooling for large-enterprise workloads and data-driven monitoring, including distributed algorithms, principles and practices, IT risk frameworks, and data consistency.
Experience with some or many of the following site reliability process, toolchain, and domain enablers:
SOX, MOF, CI/CD tools, JIRA, Confluence, SAFe Practitioner, SDLC, SRE Certifications. Legacy enterprise applications, AWS, Azure, Google Cloud, and Salesforce. Chaos Monkey, chaos engineering, perturbation modeling. Telecom, OSS/BSS, FinTech, Wireless Devices. Network and IT infrastructures (legacy and or software-defined). Cloud compute/storage/networking. Middleware. Software or System Engineering programming (e.g. C, C++, Java, Perl, Python, Go, Lang, Ruby, Scala, NodeJS).
Bachelor?s degree; Graduate degree and/or certifications desired
5-10+ years enterprise site reliability engineering experience required
Compensation: $120,000 ? 150,000 per year commensurate with skills and experience. Bonus potential.
Permanent position or multi-year contract.
WNA is a global business consulting, project management and life-cycle IT services company. WNA's North American business is a dynamic environment with excellent opportunities for people who want to join a growing business and provide unsurpassed client service and leadership to some of the top companies in the world. There are great opportunities for those who want to lead and contribute to the growth of our North American and global business. If you haven't heard of us yet, you will. WNA