Director Site Reliability
websterbank
engineering
Job Description:
The Director of Site Reliability Engineer is a pivotal technical leader within the Software Engineering organization, tasked with transforming how reliability, performance, and availability are achieved across our platforms. This role goes beyond maintaining systemsit reimagines and modernizes operational practices through automation, cloud-native design, and API-driven integration.You will lead initiatives that elevate our AWS cloud architecture and MuleSoft integration ecosystem, ensuring they are secure, scalable, and resilient. By applying advanced software engineering principles and site reliability practices, you will drive a cultural and technical shift toward proactive reliability, continuous improvement, and innovation.This role requires visionary thinking, deep technical expertise in AWS and MuleSoft, and a passion for driving change that results in more reliable, efficient, and future-ready systems.
Daily Responsibilities:
Monitoring and Observability: Implement and maintain tools for monitoring, logging, and tracing to gain insights into system performance and health
Automation: Write software and scripts to automate repetitive tasks, such as deployment, monitoring, and system management. Advocate for and lead Automation wherever possible. Ensure environments are well-managed, structured appropriately, cost effective, and synchronized as much as possible.
Incident Management: Respond to incidents, troubleshoot system-level issues, and perform root cause analysis to prevent recurrence
Reliability Engineering: Design and build reliable and scalable systems, define Service Level Objectives (SLOs) and Indicators (SLIs), and implement reliability patterns
Collaboration: Work closely with software developers to ensure applications are reliable and to provide feedback on performance in a production environment
Documentation: Create and maintain documentation, including runbooks and system diagrams, to ensure knowledge sharing and team efficiency
Set a high bar for reliability and availability -- and meet the bar via automation relentless improvement.
Improve and sustain services through rigorous development, testing and release procedures.
Key player during deliberations on system design, platform management, and capacity planning.
Have a strong 'detective' mindset on why things don't work and be among the first to offer and work on solutions.
Be a 'link' between technologists and business stakeholders: able to have conversations with Line of Business (LoB) and technical Agile teams to work through challenges.
Partner with peers to advance the maturity of the DevOps practice including new/existing technologies, tools, processes, and standards. Clearly communicate expectations on technical direction and provide ongoing guidance.
Serve as a sounding board and technical advisor for your team in the analysis, design, and execution of solutions. Help your team anticipate unforeseen dependencies or gaps early in the SDLC.
From your domains viewpoint, provide leadership and technical expertise to your Agile team to validate story points are sized appropriately, sprint plans are achievable, and releases are well-planned.
Shared accountability with peers to ensure quality, performance, and security of systems are optimal and meet both customer SLAs and internal/external audit expectations. Contribute and/or support others in the timely remediation of security remediation, audit, or production support issues escalated to the software engineering group. Occasional evening or weekend involvement may be needed for business-critical situations.
Job Requirements:
Deep understanding of systems development life cycle, cloud-based systems, and application architecture.
Experience with multiple programming languages (Python, etc.), configuration management tools, and containers strongly preferred.
Moderate to advanced proficiency of cloud development, AWS services (ECS, EKS, S3, RDS, VPC), identity management (Okta), authorization frameworks (OAuth2), monitoring (Dynatrace), Agile DevOps (GitLab, Terraform), application security (Owasp, Veracode, AppScan), and API/Integrations (Apigee, MuleSoft, BizTalk).
Familiarity with unit testing concepts and test automation frameworks (SpecFlow, SOAPUI), RESTful APIs and micro-services, WCF services, TSQL, SQL queries and stored procedures.
Advanced knowledge and strong desire to work with Agile methodology required (SAFe, Scrum) and experience with Agile tools (Jira, Confluence).
Bachelors Degree in Arts/Sciences (BA/BS) in related field required
5-7 years of progressive working experience in designing, building, and maintaining business applications across systems and networks of moderate to high complexity in cloud hosted environments required
Experience in setting up SLAs/SLOs/SLIs for critical services and establishing monitoring required
Hands on experience in cloud migration journey required
Experience and knowledge within the financial services or healthcare industries favorable preferred
Compensation:
The estimated salary range for this position is $135,000.00 to $155,000.00. Actual salary may vary up or down depending on job-related factors which may include knowledge, skills, experience, and location. In addition, this position is eligible for incentive compensation.
Job Location(s):
Stamford, Connecticut - Hybrid
Source:
Company Career Section
Competition:
N/A
Is there an interview guide for websterbank?
Yes
Click Here for Company Interview Guide
Apply Now
By clicking on the apply button , the applicant will be redirected to original job posting

QA Engineer
valitana

Quantitative Power Analyst
castletoncommoditiesinternational
ITT is moving from White Plains, New York, to Washington Blvd. in Stamford.
