Associate Director, Software Engineering (Model Hosting/Inference Optimisation)

Brand:  HSBC
Area of Interest:  Technology
Location: 

Shenzhen, GD, CN, 518010

Work style:  Hybrid Worker
Date:  20 May 2026

Some careers have more impact than others.

If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be.

 

We are currently seeking an experienced professional to join our team in the role of Associate Director, Software Engineering (Model Hosting/Inference Optimisation).

 

Business: CTO Platforms (AI Platforms)

Location: Shenzhen / Guangzhou

Req ID: 44990

 

Principal responsibilities

  • Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware. 
  • Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching). 
  • Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware. 
  • Own inference health and performance monitoring: latency, throughput, TTFT, memory, availability; troubleshoot bottlenecks and deployment issues. 
  • Partner with hardware teams to apply hardware-specific optimisations and improve resource utilisation. 
  • Ensure hosting systems meet production standards for reliability, scalability, security, and high availability. 
  • Build end-to-end, scalable fine-tuning pipelines to adapt foundation models using domain datasets. 
  • Work with data scientists/domain experts to define objectives and metrics, validate results, and integrate fine-tuned models into the hosting/inference stack.

 

Requirements

  • Bachelor’s/Master’s/PhD in ML/NLP/CS/Data Science/Statistics (or related). 
  • 3 years on AI platforms, covering both model hosting/inference optimisation and fine-tuning pipelines; LLM experience strongly preferred. 
  • Strong engineering skills in Python and CUDA, with solid understanding of GPU/CPU architecture and HPC fundamentals. 
  • Deep inference expertise: KV-cache, batching, quantisation (INT4/FP8/GPTQ/AWQ), operator optimisation, and framework integration (vLLM, TensorRT-LLM, SGLang); hands-on hosting on Docker/Kubernetes and AWS/GCP/Azure. 
  • End-to-end fine-tuning expertise: data prep, distributed training, hyperparameter tuning, HF/Accelerate/LoRA/QLoRA; plus benchmarking/monitoring/troubleshooting, AI-native mindset, and effective use of coding assistants.

 

You’ll achieve more when you join HSBC.

 

HSBC is an equal opportunity employer committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and, opportunities to grow within an inclusive and diverse environment. We encourage applications from all suitably qualified persons irrespective of, but not limited to, their gender or genetic information, sexual orientation, ethnicity, religion, social status, medical care leave requirements, political affiliation, people with disabilities, color, national origin, veteran status, etc., We consider all applications based on merit and suitability to the role. /WX

 

Personal data held by the Bank relating to employment applications will be used in accordance with our Privacy Statement, which is available on our website. 

 

***Issued By HSBC Software Development (GuangDong) Limited***