
Jul 16, 2025
- Company
- Topics
- North America
- Global Topics
- Awards
- Diversity Equity & Inclusion (DEI)
Jun 27, 2025
Company / Press Releases
Company develops "DIVE," a video recognition AI that solves questions step-by-step
Tokyo, Japan – Panasonic Connect Co., Ltd. (https://connect.panasonic.com/) today announced that it has won first place in the CVPR 2025 VidLLMs Competition, a world-class conference on image recognition, with its development of "DIVE," an AI for video recognition that solves questions in a step-by-step manner. In this competition, the video recognition AI technology "DIVE (Deep-search Iterative Video Exploration)" developed by Panasonic Connect recorded an 81% accuracy rate for complex questions related to the given video, demonstrating its high performance.
<Workshop invitation (left) World No. 1 certificate (right)>
In recent years, as the utilization of video data accelerates in various business areas, there is a growing need for AI technology that can understand video content and accurately answer questions in natural language. For example, in logistics sites, there are expectations for applications such as AI analyzing recorded footage of work processes and proposing improvements. However, conventional AI has difficulty responding to "questions" that require a deep understanding of the meaning and context of videos, posing a significant challenge to practical application.
To overcome these challenges, Panasonic Connect participated in the world's leading international conference on image recognition, CVPR 2025, as a place to comprehensively evaluate AI's video understanding and natural language response capabilities.
The VidLLMs Workshop, held for the first time at CVPR 2025, was a competition to test the performance of video large language models (VidLLMs). Panasonic Connect entered the "Complex Video Reasoning & Robustness Evaluation" category.
(For details, please check the VidLLMs Workshop – CVPR 2025 website.)
In the "Complex Video Understanding" task, video recognition AI is evaluated on how well it can handle various and difficult situations using 214 third-person perspective videos containing complex contexts and 2,400 sets of free-form descriptive questions.
The videos cover 11 complex categories, including grasping temporal order, understanding emotions and social backgrounds, and reasoning based on common sense, requiring understanding in situations close to reality. In addition, questions that deliberately ask about objects or events that are not shown, or questions that are misleading, are included to test the AI's ability to prevent hallucinations (misidentification of facts). Moreover, answers are required to be free-form descriptions in natural language, testing the ability to express according to the context.
Traditional AI models have a correct answer rate of about 75%, while humans show high accuracy at 97%, indicating that there is still a significant performance gap between AI and humans in this field.
Panasonic Connect has developed the video recognition AI technology "DIVE," which can accurately handle complex and difficult video recognition tasks. This technology employs a process of breaking down complex questions and deepening thinking while gradually understanding the context, and is an approach that anticipates the latest trend of long-time thinking of large language models (LLM), which has been attracting attention in recent years.
For example, like a detective solving a case, instead of directly answering the difficult question, "Is Mr. A the culprit?" the approach involves verifying several smaller questions one by one, such as, "Does Mr. A have an alibi?" "Does A have a motive?" "Is the alibi genuine?" to ultimately solve the difficult problem.
To realize such a "human-like thinking process," DIVE is structured around the following three main technologies:
By breaking down questions into constituent elements and reconstructing them into meaningful question lists, it realizes a design that allows AI to proceed with reasoning step by step while carefully considering. It realizes a thinking process that solves complex questions in an orderly manner, just like humans.
By linking multimodal (text, image, audio, video, etc.) large language models with object detection models, it captures important objects and scene changes in the video, and generates highly accurate summary information that comprehensively covers the entire video.
By estimating the purpose and perspective behind the questions, it enables the generation of meaningful answers that are tailored to the context, and realizes a deep understanding that captures the intentions and context behind superficial words.
<An example of a long-time thinking approach for deep video understanding,
powered by video recognition AI 'DIVE'>
By integrating these technologies, DIVE can think step by step and flexibly in response to complex questions, and ultimately derive the final solution.
Panasonic Connect will continue to work on further enhancing this technology based on the results of this achievement.
In the future, Panasonic Connect will focus on the supply chain domain (manufacturing, logistics, and retail) and promote the implementation of on-site support solutions using video recognition AI to contribute to improving operational efficiency and safety. And under the purpose of "Connecting Workplaces to Society and to the Future," Panasonic Connect will visualize on-site issues through video understanding technology and aim to realize a sustainable society where all people can live safely.
CVPR 2025 VidLLMs Challenge:
https://www.crcv.ucf.edu/cvpr2025-vidllms-workshop/challenges.html
*Some images in this press release are for illustrative purposes only and were not used in the actual competition.
About Panasonic Connect Panasonic Connect Co., Ltd. (https://connect.panasonic.com/) was established on April 1, 2022 as part of the Panasonic Group’s (https://holdings.panasonic/global/) switch to an operating company system. With roughly 28,200 employees worldwide and annual sales of JPY 1,333 billion the company plays a central role in the growth of the Panasonic Group’s B2B solutions business and provides new value to its customers by combining advanced hardware, intelligent software solutions, and a wealth of knowledge in industrial engineering accumulated in its over 100-year history. The company’s purpose is to “Change Work, Advance Society, Connect to Tomorrow.” By driving innovation in the supply chain, public services, infrastructure, and entertainment sectors, Panasonic Connect aims to contribute to the realization of a sustainable society and to ensure well-being for all. |
The content in this website is accurate at the time of publication but may be subject to change without notice.
Please note therefore that these documents may not always contain the most up-to-date information.
Please note that German, Spanish and Chinese versions are machine translations, so the quality and accuracy may vary.