Capacity and Performance Reliability Manager
90000 Annual
- Permanent
- Central London - 3 days on-site per week
- Up to £90,000 (DOE)
- Forecast demand and plan capacity across virtual, containerised, and physical environments using historical data, predictive analytics, and scenario modelling.
- Conduct stress testing, performance tuning, and automate scaling/resource provisioning with Infrastructure as Code (IaC) and cloud-native tools.
- Maintain and enhance the Capacity Management tool suite (eg, Athene, Grafana) for zero data loss and high automation.
- Develop and manage Service Level Objectives (SLOs), SLIs, error budgets, monitoring, alerting, and observability solutions.
- Lead incident response, blameless post-mortems, and continuous improvement initiatives.
- Produce capacity plans, reliability reports, and recommendations; own the recommendations tracker and report to senior management.
- Collaborate closely with development, operations, business teams, architects, and third-party suppliers to embed reliability into design and delivery.
- Champion automation, observability, and a reliability-focused culture while ensuring regulatory and governance compliance.
- 5+ years of hands-on experience in performance, capacity, or reliability management.
- At least 5 years in business-critical global banking, financial services, or technology environments, ideally with trading technologies and linking technical metrics to business outcomes.
- Proven expertise in capacity forecasting, modelling, trend analysis, and queueing theory/system modelling.
- Strong proficiency with monitoring and automation tools (eg, Athene, Grafana, Prometheus, DataDog, Terraform, Kubernetes, CI/CD pipelines).
- Significant SQL knowledge, advanced Excel skills, and coding ability (eg, Python, Visual Basic, MS SQL) plus understanding of APIs and Scripting.
- ITIL Foundation Certification (or equivalent); experience in SRE/reliability engineering highly desirable.
- Excellent analytical, communication, and stakeholder management skills to present insights to senior leaders and collaborate across technical and non-technical teams.
- Knowledge of cloud architecture, containers, orchestration, and agile practices is a plus.