60 days on an 8×H100 node: kickoff of the NVIDIA Innovation Lab grant

60 days on an 8×H100 node: kickoff of the NVIDIA Innovation Lab grant

Chapter 1 of the Sprint DGX series. From the signed pitch to the first operational week.

In April 2026, at Quantum Howl we secured a 60-day NVIDIA Innovation Lab grant on an 8×H100 SXM node (640 GB VRAM) to train two models for our dental-brain-agentic product: a multimodal dental VLM and a clinical reasoning layer, both optimized with the end-to-end NVIDIA stack for deployment on DGX Spark. This chapter covers the specific pitch we submitted, what changes the day the clock starts ticking, and why the first week was spent benchmarking 4 VLM candidates instead of starting to train the model promised in the original pitch.

The program

NVIDIA Innovation Lab is a program for NVIDIA Inception startups with temporary access to DGX-class hardware to validate a specific technical hypothesis. It is not bulk donated compute; it is an implicit contract: you promise a bounded deliverable, you get the hardware for a defined window, and you report at mid-point and final delivery. The application requires a specific pitch, not «we want to research dental AI», but which models, which stack, which calendar, which deliverable.

Quantum Howl entered the program with a product already in production behind it: dental-brain-agentic has been deployed in a real clinic on an RTX 5070 for some time, with 13 Docker containers, 99.7% uptime, 5,241 patients accumulated, and 252K DICOM images processed. The pitch hypothesis was concrete: scale the AI layer of the product by training two specialized models on the NVIDIA infrastructure that will be the deployment target in clinics (DGX Spark).

The pitch. What was promised

Two models:

  • Multimodal dental VLM, scaling the LLaMA 3.2 11B Vision + dental LoRA running in production, with QLoRA fine-tuning on 31+ dental datasets (panoramic, periapical, intraoral, cephalometric, CBCT, histopathology) plus proprietary clinic data.
  • Clinical reasoning layer on top of Nemotron 3 Nano 8B for the 9 LangGraph agents in production (diagnostic chat, no-show prediction, inventory, pharmacy assistance).

End-to-end NVIDIA stack:

Layer Technology
Training H100 via DGX Innovation Labs
Data curation NeMo Curator
Training framework NeMo Framework
Inference TensorRT-LLM (INT4 AWQ)
Serving Triton Inference Server
Deploy target DGX Spark

The deviations and substitutions from this stack are documented in chapters 4 through 7.

60-day calendar:

  • Days 1 to 15: Nemotron and VLM benchmark
  • Days 16 to 45: VLM 4-stage pipeline (DKI, DCA, SFT, RLT)
  • Days 46 to 55: TensorRT-LLM and Triton
  • Days 56 to 60: end-to-end clinical demo and case study

What changes when the clock starts

When a scientific grant kicks off, the pitch stops being a document of intentions and becomes the implicit contract of deliverables. The Day 30 mid-point survey and the Day 60 case study are evaluated against the original pitch text, not against what you decide along the way. That imposes a specific discipline on Day 1: reread the pitch separating commitments by risk type.

Verifiable commitments at signing: the NVIDIA stack, the phased calendar, the DGX Spark target, the candidate base models. Audited before submission and achievable within the calendar.

Commitments dependent on the actual hardware: instance type, cluster configuration, stop/start policy, feasible parallelism. Not auditable until the node is provisioned. The pitch says «DGX Innovation Labs» because that is all you know before Day 0.

Commitments on proprietary data that require forensic exports: number of real human validations, count of the reproducible knowledge graph, state of the annotation pipelines. The count in production is honest according to the known schema, but the contractual interpretation of terms like «physician-validated» is only locked in when formal exports are run against the production database.

That third category is the one that produces the nuances any scientific sponsor will demand at mid-point. Between Day 3 and Day 10, when running the formal exports against production PostgreSQL, two figures from the pitch needed precision:

  • The diagnoses table contained 322 entries, all in pending state. The count was honest; what changed was clarifying that pending is not physician-validated in the contractual sense a scientific sponsor will require. The correct figure for mid-point: 0 formally validated diagnoses, with the clinical validation pipeline in progress for Day 60.
  • The reproducible Neo4j knowledge graph contained 25,324 relationships, not 49,484. The larger figure corresponded to a transitional version that existed during an experiment and was not the active graph. The correct figure for mid-point: 25K+ reproducible SNOMED/ICD/MeSH relationships.

Both adjustments were documented for the mid-point survey without hiding them. That decision, reporting the precisions rather than holding to the pitch figures, is what defines closing a scientific grant with integrity.

Why the first week was NOT spent training

The pitch signed off on LLaMA 3.2 11B Vision as the base model for the VLM. It was the one Quantum Howl had in production, with the dental LoRA already trained. The decision made on Day 1 was not to assume LLaMA 3.2 was still the best choice in April 2026. The VLM landscape shifts quarterly, and starting 4 weeks of training on a model a rigorous benchmark would have ruled out is wasting the grant.

The first operational week (days 1 to 7) was spent on a multi-model benchmark against MMOral-Bench, a closed-ended and open-ended benchmark of multimodal dental reasoning, with three VLM candidates:

  • LLaMA 3.2 11B Vision (original base from the pitch)
  • Gemma 4 31B-IT (Google, Apache 2.0, dense)
  • Qwen3-VL-8B (Alibaba, SOTA Oct 2025)

A fourth model (Gemma 4 26B-A4B MoE) was downloaded to disk as a fallback but did not enter the formal benchmark. It was kept in case the framework constraint changed.

The results, the statistical decisions, and the formal pivot email to NVIDIA are the content of Chapter 2.

State at the close of Day 9

The node was not yet operational; provisioning was resolved on Day 11. The benchmark was running locally on a subset. And the question that defined everything that would come after was already on the table: if the data confirmed LLaMA 3.2 was not the best model available for what we had promised, what do we do?

The answer came on Day 5. That is Chapter 2.


Next chapter: on day 5, the benchmark data forced a change to the base model from the pitch. Gemma 4 31B-IT vs LLaMA 3.2 11B Vision, and the formal pivot email to NVIDIA the same day.

Compartir:
IA aplicada a problemas realesExplora nuestras soluciones