FastLabel Inc. (Headquarters: Shinjuku-ku, Tokyo; CEO: Takeshi Suzuki; hereinafter “FastLabel”), a company that supports the development of robot base models and domain-specific VLA models, announces that its research findings on “Domain-Specific Semantic Duplicate Elimination Technology Using VLM” have been selected for a poster presentation at NVIDIA's GTC 2026, the world's largest AI conference hosted by NVIDIA, which will take place in person and virtually from March 16 to 19. FastLabel is pleased to announce that its research findings on “Domain-Specific Semantic Duplicate Elimination Technology Using VLM” have been selected for a poster presentation at GTC 2026.This poster presents our approach to addressing the challenge of “missing domain-specific critical differences (e.g., presence of pedestrians)” when removing duplicate images in AI development based on image embedding techniques. The solution was achieved by combining advanced captioning via VLM with NVIDIA NeMo Curator. In summary, this presentation proposes a pioneering pipeline construction methodology for mission-critical domains such as autonomous driving development. This approach efficiently reduces dataset size while reliably preserving “rare scenes” directly impacting safety as data.<Key Points of the Selected Poster Presentation>High-Quality Data Selection: Identifies “meaningful differences” based on domain knowledge rather than relying on conventional uniform similarity judgments, significantly improving model generalization performance and safety. Cost and Time Optimization: Demonstrates outstanding scalability processing 10,000 images in just 4 minutes (Deduplication process). This contributes to substantial reductions in large-scale training costs and annotation expenses.Model Performance Maximization: Enhances training efficiency and improves real-world generalizability by reducing “commonplace scenes” while increasing the density of “critical scenes.” <Future Prospects for Applying the Announced Technology>Looking ahead, the company plans to apply its core technology—detecting anomalies and rarities using embedding vectors—to the construction of robotics and Vision-Language-Action (VLA) models.Video Data Optimization:Automatically detects anomalies in task start/end states from robot operation videos as “residuals after deduplication,” extracting high-quality training data. Analysis of Pose and Joint Data:By vectorizing and processing not only images but also time-series data such as arm joint angles, the technology enables the identification of “abnormal robot poses” and “unknown motion patterns.”FastLabel aims to accelerate innovation across a wide range of cutting-edge fields—from autonomous driving to robotics, which operate in complex physical environments—within the “Data-centric AI” trend where data quality plays a critical role in determining AI performance.