ستاپ شریف
ستاپ شریف

Computer Vision Engineer

Tehran/ Tarasht
Full Time
8:00 a.m. to 4:45 with one hour of flexible time
5 days in week
Bonus -Learning stipends -Snacks -Occasional packages and gifts
11 - 50 employees
IT / Software / Hardware
توضیحات بیشتر

key Requirements

3 years experience in similar position
Python - Intermediate

Job Description

Mid-Level Computer Vision Engineer 

Job Description 
We are a leading-edge AI products company focused on building next-generation intelligent systems at scale. As we continue to expand our production platforms, we are looking for a Mid-Level Computer Vision Engineer to join our team with a strong focus on video understanding and multimodal retrieval systems.
In this role, you will work primarily on production-grade computer vision pipelines that enable semantic understanding, indexing, and retrieval of large-scale video data. Your work will support multimodal search experiences, allowing users to retrieve precise information from massive image and video collections using natural language and cross-modal representations. You will collaborate closely with a small, high-impact engineering team to turn advanced models into reliable, scalable systems.
 
Responsibilities:

  • Video Understanding & Multimodal Representation.
  • Develop and maintain production pipelines for extracting semantic representations from videos and video frames using state-of-the-art Vision-Language Models (VLMs).
  • Enable semantic understanding and retrieval across image, video, and text modalities.
  • Temporal Localization & Alignment.
  • Implement and optimize algorithms for temporal localization and alignment, including Video Moment Localization and cross-modal video-text matching.
  • Support precise retrieval of relevant video segments from long-form content without manual annotation.
  • Large-Scale Visual Data Processing.
  • Build and optimize high-throughput ingestion systems for large-scale image and video datasets.
  • Prepare visual embeddings for efficient storage and retrieval in vector databases.
  • Model Optimization for Production.
  • Fine-tune and optimize models with a focus on latency, throughput, and resource efficiency.
  • Apply techniques such as quantization and parameter-efficient tuning to manage high-dimensional visual data in production environments.
  • Production Engineering & Reliability.
  • Develop, test, deploy, and document production-ready features and services.
  • Monitor model and system performance, identify bottlenecks, and contribute to continuous improvements.
  • Cross-Functional Collaboration.
  • Work closely with data scientists, backend engineers, and product teams to integrate vision systems into end-user applications.
  • Communicate technical decisions, trade-offs, and system limitations clearly across teams.

Required Skills & Qualifications:

  • Solid experience with Computer Vision and Deep Learning, particularly for video-based tasks.
  • Familiarity with Vision-Language Models (e.g., CLIP-like architectures) and multimodal embedding spaces.
  • Proficiency in Python, with experience building production ML pipelines.
  • Practical experience deploying or maintaining production ML or CV systems.
  • Understanding of performance optimization, including batching, latency reduction, and memory efficiency.
  • Ability to work independently on defined tasks while contributing effectively within a team.
  • Clear communication skills for collaborating with technical and non-technical stakeholders.
  • Experience with video retrieval, temporal grounding, or video-text alignment.
  • Familiarity with vector databases and large-scale embedding systems.
  • Experience working with high-volume visual or multimedia data.

Job Requirements

Age
24 - 35 Years Old
Gender
Men / Women
Language
English| Intermediate - 50%
Software
Python| Intermediate

ثبت مشکل و تخلف آگهی

ارسال رزومه برای ستاپ شریف

این آگهی بسته شده است
insight applicant

مقایسه من با سایر متقاضیان