Nvidia Nemotron Nano 12B V2 VL
Nvidia Nemotron Nano 12B V2 VL is NVIDIA's open 12B multimodal reasoning model with a hybrid Mamba-Transformer architecture, OCRBenchV2 results, and specialized support for document intelligence, video understanding, and RAG pipelines.
import { streamText } from 'ai'
const result = streamText({ model: 'nvidia/nemotron-nano-12b-v2-vl', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What types of image inputs does Nvidia Nemotron Nano 12B V2 VL support?
The model handles image Q&A, OCR, dense captioning, and multi-image reasoning. Nvidia Nemotron Nano 12B V2 VL cited OCRBenchV2 at launch. OCRBenchV2 tests text extraction from document images with complex layouts, tables, and mixed formatting.
What is Efficient Video Sampling (EVS)?
EVS identifies and prunes temporally static patches in video sequences (frames where little changes between consecutive images). Removing redundant patches reduces the token count per video clip. The model can process longer videos with up to 2.5x higher throughput without sacrificing accuracy.
How does this model support RAG pipelines?
Nvidia Nemotron Nano 12B V2 VL serves as the reasoning component for visual content in the Nemotron RAG suite. Embedding models in the same family appear on ViDoRe, MTEB, and MMTEB leaderboards for visual, multimodal, and multilingual text retrieval. Together, they enable retrieval-augmented generation (RAG) across proprietary data with mixed-modality documents.
What benchmark did Nvidia Nemotron Nano 12B V2 VL highlight at launch?
OCRBenchV2. It measures document intelligence and optical character recognition on visually complex documents.
Is this model open source?
Yes. NVIDIA released model weights on Hugging Face under the NVIDIA Open Model License.
Can I use this model for multi-image reasoning tasks?
Yes. Multi-image reasoning is part of the model's task coverage across image Q&A, OCR, dense captioning, video Q&A, and multi-image reasoning. You can use it for tasks like comparing document versions, analyzing image sequences, or reasoning over slide decks.
Where are per-token prices listed?
Rates are listed on this page. They reflect the providers routing through AI Gateway and shift when providers update their pricing.