Loading...

Data Scientist – Document AI / OCR

Location: Delhi NCR, India

Experience: 3 - 6 yrs

Job Type: Full-Time / Contract

Education:

  • UG: B.Tech/B.E. in Computer Science, Data Science, Artificial Intelligence, or a related field
  • PG: Master's degree in Computer Science, Data Science, or AI (Preferred)

Job Description

Project Role Description: We are seeking a Data Scientist with 3–6 years of experience to develop intelligent solutions for document digitization and data extraction. The candidate will work on building and implementing models for OCR/OMR processing, computer vision, and document understanding to extract structured information from documents such as invoices, receipts, purchase orders, bank receipts, bank statements, and handwritten documents. The role requires strong expertise in machine learning, computer vision, and document processing technologies, along with the ability to design scalable solutions using Vision APIs and AI-based document processing tools.

Key Responsibilities:

Document AI & OCR Development

  • Develop and implement solutions for OCR (Optical Character Recognition) and OMR (Optical Mark Recognition).
  • Build systems to digitize handwritten and printed documents.
  • Extract structured data from invoices, receipts, purchase orders, bank receipts, and bank statements.

Computer Vision & AI Models

  • Develop computer vision models for document recognition and classification.
  • Use Vision APIs and AI tools for text detection, extraction, and document understanding.
  • Train and optimize machine learning models for accuracy and performance in document processing tasks.

Data Processing & Automation

  • Design pipelines for document ingestion, preprocessing, and data extraction.
  • Convert unstructured document data into structured formats for downstream applications.
  • Automate document processing workflows.

Technology Implementation

  • Work with cloud-based vision services and document AI platforms.
  • Integrate OCR and document extraction solutions with enterprise systems and databases.

Testing & Optimization

  • Evaluate model performance and improve accuracy for handwritten and scanned documents.
  • Optimize algorithms to handle different document layouts and formats.

Collaboration & Documentation

  • Work closely with developers, architects, and business teams to define requirements.
  • Document models, workflows, and solution architecture.
Qualifications:
  • Bachelor's or Master's degree in Computer Science, Data Science, Artificial Intelligence, or related field.
  • 3–6 years of experience in Data Science, Computer Vision, or Document AI solutions.
Required Skills:
  • Experience with OCR and OMR technologies.
  • Strong knowledge of computer vision and image processing techniques.
  • Experience with Vision APIs or document AI platforms.
  • Proficiency in Python and machine learning frameworks.
  • Experience processing handwritten and scanned documents.
  • Ability to build data extraction pipelines from unstructured documents.
Preferred Skills:
  • Experience with cloud-based AI services (AWS, Azure, or GCP Vision services).
  • Familiarity with document layout detection and NLP for document understanding.
  • Experience working with financial or enterprise documents such as invoices and statements.
Preferred Tools / Technologies:
  • Python, OpenCV, TensorFlow / PyTorch
  • OCR frameworks (Tesseract, EasyOCR, PaddleOCR, etc.)
  • Vision APIs (AWS Textract, Azure Vision, Google Vision)
  • Document AI / Layout parsing tools
  • Data processing and ML pipelines

Why Choose Us

We're Best in Data Industry with 10 Years of Experience

We’re leaders in the data industry with over 10 years of experience, delivering innovative data solutions that drive business transformation. Our expertise in data pipeline creation has empowered various clients across industries to harness the full potential of their data. For a global fintech firm, we built real-time data pipelines enabling instant fraud detection and risk monitoring. For a leading retail company, we developed scalable pipelines for real-time sales and inventory tracking. Additionally, for a healthcare provider, we created pipelines for secure, real-time patient data processing, improving care and compliance.

Real time Data Ingestion
Batch Data Ingestion
Event Handling on Moving data

21

Happy Clients

84

Project Complete

Data Scientist Job