Panasonic Connect Wins First Place at CVPR 2025 VidLLMs Competition, the World's Premier Image Recognition Conference

Company develops "DIVE," a video recognition AI that solves questions step-by-step

Tokyo, Japan – Panasonic Connect Co., Ltd. (https://connect.panasonic.com/) today announced that it has won first place in the CVPR 2025 VidLLMs Competition, a world-class conference on image recognition, with its development of "DIVE," an AI for video recognition that solves questions in a step-by-step manner. In this competition, the video recognition AI technology "DIVE (Deep-search Iterative Video Exploration)" developed by Panasonic Connect recorded an 81% accuracy rate for complex questions related to the given video, demonstrating its high performance.

Development Background

＜Workshop invitation (left) World No. 1 certificate (right)＞

In recent years, as the utilization of video data accelerates in various business areas, there is a growing need for AI technology that can understand video content and accurately answer questions in natural language. For example, in logistics sites, there are expectations for applications such as AI analyzing recorded footage of work processes and proposing improvements. However, conventional AI has difficulty responding to "questions" that require a deep understanding of the meaning and context of videos, posing a significant challenge to practical application.

To overcome these challenges, Panasonic Connect participated in the world's leading international conference on image recognition, CVPR 2025, as a place to comprehensively evaluate AI's video understanding and natural language response capabilities.

Overview of the Complex Video Reasoning & Robustness Evaluation Task

The VidLLMs Workshop, held for the first time at CVPR 2025, was a competition to test the performance of video large language models (VidLLMs). Panasonic Connect entered the "Complex Video Reasoning & Robustness Evaluation" category.
(For details, please check the VidLLMs Workshop – CVPR 2025 website.)

In the "Complex Video Understanding" task, video recognition AI is evaluated on how well it can handle various and difficult situations using 214 third-person perspective videos containing complex contexts and 2,400 sets of free-form descriptive questions.

The videos cover 11 complex categories, including grasping temporal order, understanding emotions and social backgrounds, and reasoning based on common sense, requiring understanding in situations close to reality. In addition, questions that deliberately ask about objects or events that are not shown, or questions that are misleading, are included to test the AI's ability to prevent hallucinations (misidentification of facts). Moreover, answers are required to be free-form descriptions in natural language, testing the ability to express according to the context.

Traditional AI models have a correct answer rate of about 75%, while humans show high accuracy at 97%, indicating that there is still a significant performance gap between AI and humans in this field.

Development of AI Technology DIVE

Panasonic Connect has developed the video recognition AI technology "DIVE," which can accurately handle complex and difficult video recognition tasks. This technology employs a process of breaking down complex questions and deepening thinking while gradually understanding the context, and is an approach that anticipates the latest trend of long-time thinking of large language models (LLM), which has been attracting attention in recent years.

For example, like a detective solving a case, instead of directly answering the difficult question, "Is Mr. A the culprit?" the approach involves verifying several smaller questions one by one, such as, "Does Mr. A have an alibi?" "Does A have a motive?" "Is the alibi genuine?" to ultimately solve the difficult problem.

To realize such a "human-like thinking process," DIVE is structured around the following three main technologies:

1. "Long-Time Thinking" Process to Deeply Consider Complex Questions Step by Step:

By breaking down questions into constituent elements and reconstructing them into meaningful question lists, it realizes a design that allows AI to proceed with reasoning step by step while carefully considering. It realizes a thinking process that solves complex questions in an orderly manner, just like humans.

2. Comprehensive Video Summary Generation Technology Based on Important Objects:

By linking multimodal (text, image, audio, video, etc.) large language models with object detection models, it captures important objects and scene changes in the video, and generates highly accurate summary information that comprehensively covers the entire video.

3. Context Understanding Technology to Understand the Intention of Questions:

By estimating the purpose and perspective behind the questions, it enables the generation of meaningful answers that are tailored to the context, and realizes a deep understanding that captures the intentions and context behind superficial words.

＜An example of a long-time thinking approach for deep video understanding,
powered by video recognition AI 'DIVE'＞

By integrating these technologies, DIVE can think step by step and flexibly in response to complex questions, and ultimately derive the final solution.

Future Development

Panasonic Connect will continue to work on further enhancing this technology based on the results of this achievement.
In the future, Panasonic Connect will focus on the supply chain domain (manufacturing, logistics, and retail) and promote the implementation of on-site support solutions using video recognition AI to contribute to improving operational efficiency and safety. And under the purpose of "Connecting Workplaces to Society and to the Future," Panasonic Connect will visualize on-site issues through video understanding technology and aim to realize a sustainable society where all people can live safely.

Patents Related to This Matter: 1 patent application pending

Related Information:

CVPR 2025 VidLLMs Challenge:
https://www.crcv.ucf.edu/cvpr2025-vidllms-workshop/challenges.html

*Some images in this press release are for illustrative purposes only and were not used in the actual competition.

About Panasonic Connect

Panasonic Connect Co., Ltd. (https://connect.panasonic.com/) was established on April 1, 2022 as part of the Panasonic Group’s (https://holdings.panasonic/global/) switch to an operating company system. With roughly 28,200 employees worldwide and annual sales of JPY 1,333 billion the company plays a central role in the growth of the Panasonic Group’s B2B solutions business and provides new value to its customers by combining advanced hardware, intelligent software solutions, and a wealth of knowledge in industrial engineering accumulated in its over 100-year history. The company’s purpose is to “Change Work, Advance Society, Connect to Tomorrow.” By driving innovation in the supply chain, public services, infrastructure, and entertainment sectors, Panasonic Connect aims to contribute to the realization of a sustainable society and to ensure well-being for all.

The content in this website is accurate at the time of publication but may be subject to change without notice.
Please note therefore that these documents may not always contain the most up-to-date information.
Please note that German, Spanish and Chinese versions are machine translations, so the quality and accuracy may vary.