Senior Consultant Specialist
Guangzhou, GD, CN, 510620
Some careers have more impact than othersIf you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be.
We are currently seeking an experienced professional to join our team in the role of Senior Consultant Specialist.
Business: AMH Technology
Job ID:11973
What you’ll do:
- Lead the design, implementation, and enhancement of service monitoring systems to ensure services operate within agreed Service Level Objectives (SLOs) and enable rapid response to performance indicator breaches.
- Drive automation initiatives by identifying opportunities to replace manual tasks with software solutions, improving efficiency and reliability across systems.
- Perform in-depth system analysis, configuration management, and implement improvements to enhance system software performance, availability, scalability, and reliability.
- Oversee and approve deployment changes, ensuring adherence to best practices and minimizing change-related incidents that could impact the error budget.
- Collaborate with cross-functional teams, including software engineers, testers, and product managers, to ensure systems meet non-functional requirements such as performance, security, and availability.
- Develop and enforce best practices for incident management, root cause analysis, and post-mortem processes to improve system resilience.
- Mentor and guide junior SREs, fostering a culture of continuous learning and operational excellence.
- Maintain and expand system documentation, including runbooks, architecture diagrams, and operational procedures, ensuring critical knowledge is accessible to the team.
- Lead capacity planning and disaster recovery strategies to ensure system readiness for growth and unexpected events.
- Stay updated on industry trends and emerging technologies, driving innovation and improvements in reliability engineering practices.
- Minimum 10 years of experience in production support, SRE, or DevOps roles, with a proven track record of managing and improving large-scale, mission-critical systems.
- Advanced programming and scripting skills (e.g., Java, Python, Go, SQL, API development, backend systems).
- Extensive experience with containerization (Docker) and orchestration platforms (Kubernetes), including designing and managing large-scale deployments.
- Proficiency in monitoring and observability tools such as Splunk, CloudWatch, AppDynamics, Prometheus, or Grafana.
- Strong expertise in Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Ansible, with experience in managing cloud-based infrastructure (AWS, Azure, or GCP).
- Demonstrable experience in designing and implementing automation pipelines for CI/CD and operational tasks.
- Proven ability to lead cross-functional teams to resolve complex technical issues and drive system improvements.
- Strong understanding of security best practices, including vulnerability management and secure system design.
- Excellent written and verbal communication skills in both Mandarin and English, with the ability to communicate complex technical concepts to diverse audiences.
- Experience in mentoring and leading junior engineers, fostering a collaborative and high-performing team environment.
-
Strong analytical and problem-solving skills, with a focus on delivering scalable and reliable solutions.