
Nov 27, 2025
- Company
- Press Releases
- AI & Robotics

Apr 17, 2025
Company / Press Releases
Osaka, Japan, April 17, 2025 – Panasonic R&D Company of America (PRDCA) and Panasonic Holdings Co., Ltd. (Panasonic HD), in collaboration with researchers from the University of California, Berkeley (UC Berkeley), have developed SegLLM, an interactive segmentation technology that allows users to specify recognition targets using language and reference images.
Segmentation is a technology that divides an image into multiple regions at the pixel level. By integrating with image recognition, it enables the detection of specific objects and accurately captures their position and shape. This makes it applicable to various fields, such as object recognition in factories, environmental recognition around vehicles and object manipulation by robots. Recently, in the field of image recognition, there has been an increasing use of large language models (LLMs) to specify recognition targets using text. However, when providing instructions interactively, if new instructions are based on objects recognized in past interactions, the text can become complex, leading to a higher likelihood of misrecognition. The newly developed SegLLM addresses this issue by allowing the input of both text and reference images into prompts, enabling the recognition of hierarchical relationships between objects and interactions among objects, even for untrained objects. It also makes it possible to recognize only specific objects in more complex scenes where there are many similar looking objects.
This technology has been internationally recognized for its advanced capabilities and has been accepted at the International Conference on Learning Representations (ICLR 2025), a leading conference in AI and machine learning technologies. It will be presented at the conference held in Singapore from April 24 to April 28, 2025.
Figure 1: In current VLM, complex prompt can lead to misrecognition
Panasonic HD and PRDCA are engaged in research on Vision and Language Models (VLM)* elated to segmentation technology. Recently, advancements in language models have increased the methods available for flexibly specifying recognition targets in text format. However, when performing segmentation interactively, there is a challenge in that the text can become complex when issuing new instructions based on objects recognized in past interactions, leading to a higher likelihood of misrecognition.
Figure 2: Architecture of SegLLM
In contrast, SegLLM utilizes a method where both text and reference images are input together in the prompt. Specifically, the reference images are embedded into the same feature space as the text, allowing them to be input into the LLM. This approach enables the use of segmentation images (masks) output by the LLM in previous interactions to create reference images that isolate only the masked objects. These reference images can then be used in prompts to issue new instructions, allowing for instructions that take into account past interactions without increasing the length of text input.
Figure 3: Evaluation performance in interactive segmentation dataset
In addition to the structure of SegLLM, the paper also proposes a training and evaluation dataset for interactive segmentation. Evaluation experiments using the proposed dataset demonstrated that, while existing methods experience significant degradation in recognition accuracy as interactions progress, SegLLM successfully mitigates this accuracy deterioration substantially (Figure 3).
The newly developed SegLLM is a technology that significantly enhances the performance of interactive segmentation. Panasonic HD plans to implement this technology into the automatic annotation tool they are developing with FastLabel. By expanding the range of application to include targets that are difficult to detect with traditional text-only instructions, such as untrained objects or items held by specific individuals, this tool will evolve into a more versatile solution. Leveraging the characteristics of SegLLM, it can reduce training costs on-site in factories and production lines where a wide variety of instruments and tools exist. This will accelerate optimization in factories and similar environments through applications in Cyber-Physical Systems (CPS).
Panasonic HD will continue to accelerate the implementation of AI in society and promote research and development of AI technologies that will contribute to improving our customers' lives and workplaces.
* [Press Release] Panasonic R&D Company of America Develops New Multimodal Foundation Model That Can Perform Image Recognition and Segmentation in Response to Any Text Input (Nov 21, 2023)
https://news.panasonic.com/global/press/en231121-5
“SegLLM: Multi-round Reasoning Segmentation”
This research is the result of a collaborative effort between Konstantinos Kallidromitis from PRDCA, Xudong Wang from UC Berkeley, and Yusuke Kato and Kazuki Kozuka from Panasonic HD.
https://arxiv.org/abs/2410.18923
Panasonic × AI website
https://tech-ai.panasonic.com/en/
|
About the Panasonic Group Founded in 1918, and today a global leader in developing innovative technologies and solutions for wide-ranging applications in the consumer electronics, housing, automotive, industry, communications, and energy sectors worldwide, the Panasonic Group switched to an operating company system on April 1, 2022 with Panasonic Holdings Corporation serving as a holding company and eight companies positioned under its umbrella. The Group reported consolidated net sales of 8,496.4 billion yen for the year ended March 31, 2024. To learn more about the Panasonic Group, please visit: https://holdings.panasonic/global/ |
The content in this website is accurate at the time of publication but may be subject to change without notice.
Please note therefore that these documents may not always contain the most up-to-date information.
Please note that German, Spanish and Chinese versions are machine translations, so the quality and accuracy may vary.