Senior Consultant Specialist (Model Hosting/Inference Optimization)
Guangzhou, GD, CN, 510620
Job description
Some careers have more impact than others.
If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be.
We are currently seeking an experienced professional to join our team in the role of Senior Consultant Specialist (Model Hosting/Inference Optimization).
Business: CTO Platforms (AI Platforms)
Job ID: 46913
Location: Guangzhou
We are seeking a highly skilled Model Hosting, Inference Optimization & Fine-Tuning Pipeline Engineer to join our AI platform team. This role encompasses two core pillars:
1) Building and maintaining scalable model hosting systems and optimizing inference performance for AI models;
2) Designing and implementing end-to-end model fine-tuning pipelines to support domain-specific adaptation of pre-trained models.
You will collaborate closely with AI researchers, data scientists, software engineers, and product teams to deliver production-grade solutions that combine optimized inference, reliable hosting, and flexible fine-tuning capabilities for a wide range of AI models.
Principal responsibilities
- Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware.
- Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching).
- Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware.
- Own inference health and performance monitoring: latency, throughput, TTFT, memory, availability; troubleshoot bottlenecks and deployment issues.
- Partner with hardware teams to apply hardware-specific optimisations and improve resource utilisation.
- Ensure hosting systems meet production standards for reliability, scalability, security, and high availability.
- Build end-to-end, scalable fine-tuning pipelines to adapt foundation models using domain datasets.
- Work with data scientists/domain experts to define objectives and metrics, validate results, and integrate fine-tuned models into the hosting/inference stack.
Requirements
- Bachelor’s/Master’s/PhD in ML/NLP/CS/Data Science/Statistics (or related).
- 3 years on AI platforms, covering both model hosting/inference optimisation and fine-tuning pipelines; LLM experience strongly preferred.
- Strong engineering skills in Python and CUDA, with solid understanding of GPU/CPU architecture and HPC fundamentals.
- Deep inference expertise: KV-cache, batching, quantisation (INT4/FP8/GPTQ/AWQ), operator optimisation, and framework integration (vLLM, TensorRT-LLM, SGLang); hands-on hosting on Docker/Kubernetes and AWS/GCP/Azure.
- End-to-end fine-tuning expertise: data prep, distributed training, hyperparameter tuning, HF/Accelerate/LoRA/QLoRA; plus benchmarking/monitoring/troubleshooting, AI-native mindset, and effective use of coding assistants.
You’ll achieve more when you join HSBC.
HSBC is an equal opportunity employer committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and, opportunities to grow within an inclusive and diverse environment. We encourage applications from all suitably qualified persons irrespective of, but not limited to, their gender or genetic information, sexual orientation, ethnicity, religion, social status, medical care leave requirements, political affiliation, people with disabilities, color, national origin, veteran status, etc., We consider all applications based on merit and suitability to the role. /WX
Personal data held by the Bank relating to employment applications will be used in accordance with our Privacy Statement, which is available on our website.
***Issued By HSBC Software Development (GuangDong) Limited***