About Position:
We are looking for a highly skilled NLP/LLM Engineer to join our team and work on cutting-edge large language model (LLM) applications. The ideal candidate will have hands-on experience in prompt engineering, LLM training and fine-tuning, and model/chatbot evaluation.
You should be able to leverage state-of-the-art knowledge and recent research in NLP/LLMs to build and optimise intelligent conversational agents, while also developing scalable pipelines for model evaluation.
Responsibilities:
- Prompt Engineering
- Design, optimise, and evaluate prompts for various NLP/LLM use cases.
- Apply advanced prompting techniques such as self-consistency, chain-of-thought (CoT), few-shot, zero-shot, retrieval-augmented prompting, etc.
- Develop systematic prompt evaluation frameworks and automate prompt optimisation using both open-source and API-based models.
- Experiment with guardrails, role-based prompting, and output formatting to ensure reliability and safety in chatbot responses.
- Model Training & Fine-Tuning
- Develop and train LLMs using PyTorch, Hugging Face, Unsloth, and vLLM.
- Implement multi-GPU/distributed training with DeepSpeed, FSDP, or Accelerate.
- Fine-tune models with Supervised Fine-Tuning (SFT) and Reinforcement Learning ( PPO, DPO, GRPO, etc.).
- Apply parameter-efficient fine-tuning (PEFT) methods (LoRA/QLoRA, IA3, etc.) for efficiency.
- Familiarity with mixed precision, FlashAttention, quantisation, gradient checkpointing, optimiser sharding (ZeRO), and other advanced training techniques.
- Build and maintain data preprocessing pipelines for large-scale NLP training datasets.
- Track and document model training experiments for reproducibility (e.g., using Weights & Biases or MLflow).
- Model & Chatbot Evaluation
- Develop evaluation pipelines for both general LLMs and RAG-based chatbots.
- Apply automatic evaluation metrics such as BLEU, ROUGE, METEOR, BERTScore, perplexity, etc.
- Use LLM-as-a-judge methods (both open-source and closed-source via APIs) for qualitative evaluation.
- Design user simulation frameworks for stress-testing chatbots in real-world scenarios.
- Build modular and scalable evaluation pipelines to benchmark model accuracy, robustness, safety, and latency.
- Software Engineering Practices
- Use Git for version control, branching workflows, and collaborative development.
- (Optional but preferred) Experience with modern software engineering tools and practices, including Docker, CI/CD pipelines, and Kubernetes.
Requirements:
- Strong background in Natural Language Processing (NLP) and Large Language Models (LLMs).
- Proven experience in prompt engineering with real-world applications.
- Hands-on experience in training and fine-tuning large models using Hugging Face, PyTorch, and distributed training setups.
- Practical knowledge of applying PEFT methods, SFT, and RLHF/RLVR on LLMs.
- Experience in designing evaluation frameworks for LLMs and chatbots.
- Proficiency with Git for collaborative development.
- Strong problem-solving skills, ability to work reliably, and encouragement to collaborate in a team environment.
- (Preferred) Familiarity with Docker, CI/CD pipelines, and Kubernetes.
What We Offer
A dynamic, fast-paced environment where your ideas directly impact real-world products.
A collaborative team culture that values innovation, technical excellence, and continuous learning.
The chance to help shape the future at MCINEXT.