enInteligencia ArtificialLarge Language ModelsNVIDIAOdontologíaStartups

60 days on an 8×H100 node: kickoff of the NVIDIA Innovation Lab grant

18 de mayo de 20266 min read1,074 words

Chapter 1 of the Sprint DGX series. From the signed pitch to the first operational week.

In April 2026, at Quantum Howl we secured a 60-day NVIDIA Innovation Lab grant on an 8×H100 SXM node (640 GB VRAM) to train two models for our dental-brain-agentic product: a multimodal dental VLM and a clinical reasoning layer, both optimized with the end-to-end NVIDIA stack for deployment on DGX Spark. This chapter covers the specific pitch we submitted, what changes the day the clock starts ticking, and why the first week was spent benchmarking 4 VLM candidates instead of starting to train the model promised in the original pitch.

The program

NVIDIA Innovation Lab is a program for NVIDIA Inception startups with temporary access to DGX-class hardware to validate a specific technical hypothesis. It is not bulk donated compute; it is an implicit contract: you promise a bounded deliverable, you get the hardware for a defined window, and you report at mid-point and final delivery. The application requires a specific pitch, not “we want to research dental AI”, but which models, which stack, which calendar, which deliverable.

Quantum Howl entered the program with a product already in production behind it: dental-brain-agentic has been deployed in a real clinic on a desktop-class GPU for some time, with around ten Docker containers, high uptime, several thousand patients accumulated, and hundreds of thousands of DICOM images processed. The pitch hypothesis was concrete: scale the AI layer of the product by training two specialized models on the NVIDIA infrastructure that will be the deployment target in clinics (DGX Spark).

The pitch. What was promised

Two models:

Multimodal dental VLM, scaling the LLaMA 3.2 11B Vision + dental LoRA running in production, with QLoRA fine-tuning on 31+ dental datasets (panoramic, periapical, intraoral, cephalometric, CBCT, histopathology) plus proprietary clinic data.
Clinical reasoning layer on top of Nemotron 3 Nano 8B, as a candidate to serve the ten production agents orchestrated by an in-house registry (diagnostic chat, no-show prediction, inventory, pharmacy assistance). A piece to be evaluated during the sprint, not a component already deployed.

End-to-end NVIDIA stack:

Layer	Technology
Training	H100 via DGX Innovation Labs
Data curation	NeMo Curator
Training framework	NeMo Framework
Inference	TensorRT-LLM (INT4 AWQ)
Serving	Triton Inference Server
Deploy target	DGX Spark

The deviations and substitutions from this stack are documented in chapters 4 through 7.

60-day calendar:

Days 1 to 15: Nemotron and VLM benchmark
Days 16 to 45: VLM 4-stage pipeline (DKI, DCA, SFT, RLT)
Days 46 to 55: TensorRT-LLM and Triton
Days 56 to 60: end-to-end clinical demo and case study

What changes when the clock starts

When a scientific grant kicks off, the pitch stops being a document of intentions and becomes the implicit contract of deliverables. The Day 30 mid-point survey and the Day 60 case study are evaluated against the original pitch text, not against what you decide along the way. That imposes a specific discipline on Day 1: reread the pitch separating commitments by risk type.

Verifiable commitments at signing: the NVIDIA stack, the phased calendar, the DGX Spark target, the candidate base models. Audited before submission and achievable within the calendar.

Commitments dependent on the actual hardware: instance type, cluster configuration, stop/start policy, feasible parallelism. Not auditable until the node is provisioned. The pitch says “DGX Innovation Labs” because that is all you know before Day 0.

Commitments on proprietary data that require forensic exports: number of real human validations, count of the reproducible knowledge graph, state of the annotation pipelines. The count in production is honest according to the known schema, but the contractual interpretation of terms like “physician-validated” is only locked in when formal exports are run against the production database.

That third category is the one that produces the nuances any scientific sponsor will demand at mid-point. Between Day 3 and Day 10, when running the formal exports against the production database, a couple of figures from the pitch needed precision:

What the count listed as available diagnoses were, strictly speaking, records still pending formal clinical validation. The count was honest; what changed was clarifying that a pending record is not physician-validated in the contractual sense a scientific sponsor will require, and leaving the clinical validation pipeline in progress for Day 60.
The reproducible count of the knowledge graph came in below the pitch’s initial estimate: some of the relationships corresponded to a transitional version of an experiment and not to the active graph. The correct figure for mid-point is that of the reproducible SNOMED/ICD/MeSH graph.

Both adjustments were documented for the mid-point survey without hiding them. That decision, reporting the precisions rather than holding to the pitch figures, is what defines closing a scientific grant with integrity.

Why the first week was NOT spent training

The pitch signed off on LLaMA 3.2 11B Vision as the base model for the VLM. It was the one Quantum Howl had in production, with the dental LoRA already trained. The decision made on Day 1 was not to assume LLaMA 3.2 was still the best choice in April 2026. The VLM landscape shifts quarterly, and starting 4 weeks of training on a model a rigorous benchmark would have ruled out is wasting the grant.

The first operational week (days 1 to 7) was spent on a multi-model benchmark against MMOral-Bench, a closed-ended and open-ended benchmark of multimodal dental reasoning, with three VLM candidates:

LLaMA 3.2 11B Vision (original base from the pitch)
Gemma 4 31B-IT (Google, Apache 2.0, dense)
Qwen3-VL-8B (Alibaba, SOTA Oct 2025)

A fourth model (Gemma 4 26B-A4B MoE) was downloaded to disk as a fallback but did not enter the formal benchmark. It was kept in case the framework constraint changed.

The results, the statistical decisions, and the formal pivot email to NVIDIA are the content of Chapter 2.

State at the close of Day 9

The node was not yet operational; provisioning was resolved on Day 11. The benchmark was running locally on a subset. And the question that defined everything that would come after was already on the table: if the data confirmed LLaMA 3.2 was not the best model available for what we had promised, what do we do?

The answer came on Day 5. That is Chapter 2.

Next chapter: on day 5, the benchmark data forced a change to the base model from the pitch. Gemma 4 31B-IT vs LLaMA 3.2 11B Vision, and the formal pivot email to NVIDIA the same day.

Tags:

#dental-brain#dgx-h100#fine-tuning#innovation-lab#nvidia#nvidia inception#on-premise#sprint-dgx

Back to Blog