image: Panasonic HD develops multimodal generative AI “OmniFlow” which enables Any-to-Any generation between text, image, and audio

Jun 04, 2025

Company / Press Release

Panasonic HD develops multimodal generative AI “OmniFlow” which enables Any-to-Any generation between text, image, and audio

Osaka, Japan, June 4, 2025 – Panasonic Holdings Co., Ltd. (Panasonic HD) and Panasonic R&D Company of America (PRDCA), in collaboration with researchers at the University of California, Los Angeles (UCLA), have developed OmniFlow, a multimodal generative AI that can freely convert different data formats such as text, images, and audios (hereinafter referred to as “Any-to-Any”).

In recent years, research on multimodal generative AI that realizes conversion between different data formats has been actively conducted, but since it is usually necessary to prepare all pairs of data to be handled for training data, the cost of acquiring data increases as the type of data to be handled increases. By flexibly combining generative AI (text ↔ audio, text ↔ image) specialized for each data format, OmniFlow can learn high-precision Any-to-Any models even with a small number of data (text ↔ audio ↔ images) consisting of all three sets of modalities, and has succeeded in significantly reducing the cost of creating training data. (Fig. 1)

This technology has been internationally recognized for its advanced technology and has been accepted at CVPR 2025, a top conference for AI and Computer Vision. It will be presented at the plenary conference to be held in Nashville, USA from June 11, 2025 to June 15, 2025.

Figure 1 Example of generation by OmniFlow

■Details of the technology

Panasonic HD and PRDCA are working on research on multimodal generative AI. In recent years, multimodal generative AI that incorporates audio in addition to text and images has been attracting attention, but the method of obtaining data that includes all text, images, and audio has been limited, and it has been costly to increase variations.

The solution to this problem is the key to accelerating the use of multimodal generative AI, and research has been actively conducted in recent years. In fact, a method that can learn even if the combination of different data including all the data formats you want to handle is not completely aligned has recently been proposed, but it is realized by averaging the input data. It can be said that there is still a lot of room for improvement in terms of expressive ability.

On the other hand, we have developed OmniFlow, which extends the existing framework of image generation flow matching*, and can learn complex relationships between data that cannot be obtained by averaging by connecting and processing three different data features during the generation process. (Fig. 2)

* A technology that uses Flow to find the optimal conversion path between arbitrary data.
In recent years, it has been attracting attention as it has been adopted for various generative models, including image generation.

Figure 2 Existing Flow Matching (top) and OmniFlow (bottom) architectures

A big advantage of OmniFlow is that you can easily connect AIs that specialize in text-to-image and text-to-audio generation into a single multimodal generative AI. (Fig. 3) Since specialized AI is excellent at generating each data, it was possible to obtain high multimodal performance without learning a large amount of data consisting of all modalities.

Fig. 3 OmniFlow Learning Process
Retrain the “text→image” task and the “text→audio” task by connecting specialized AIs that have already been trained.

In the evaluation experiment, the performance of the “text→image” and “text→audio” generation tasks was compared with existing methods. (Fig. 4) As a result, it was confirmed that OmniFlow has the best performance among any-to-any methods (Generalist) and specialized methods for each task. We also found that the data size required to train OmniFlow can be reduced to up to 1/60 compared to other any-to-any methods.

Fig. 4 Evaluation results (left: text→ image, right: text→audio)
Param is the number of model parameters, and Images is the number of training images.
Gen is an index that indicates the quality of the generated image, and FAD and CLAP represent the quality of the generated audio. ↑ indicates that the higher the number, the better the performance, and the lower the number, the better the performance.

■Future prospects

The newly developed OmniFlow is an any-to-any method that flexibly combines generative AI specialized for each data format (text→audio, text→image) and is highly accurate even if the number of training sheets for all three data pairs (text ↔ audio ↔ image) is small. By learning this technology in various fields such as factories and lifestyles, it will be possible to generate various types of data specialized in those sites, and it is expected to expand the range of applications of multimodal AI.

Going forward, Panasonic HD will continue to accelerate the social implementation of AI and promote the research and development of AI technologies that contribute to the usefulness of our customers' lives and workplaces.

[Related information]

About the Panasonic Group

Founded in 1918, and today a global leader in developing innovative technologies and solutions for wide-ranging applications in the consumer electronics, housing, devices, B2B solutions and energy sectors worldwide, the Panasonic Group switched to an operating company system on April 1, 2022 with Panasonic Holdings Corporation serving as a holding company. The Group reported consolidated net sales of 8,458.2 billion yen for the year ended March 31, 2025. To learn more about the Panasonic Group, please visit: https://holdings.panasonic/global/

The content in this website is accurate at the time of publication but may be subject to change without notice.
Please note therefore that these documents may not always contain the most up-to-date information.
Please note that German, French and Chinese versions are machine translations, so the quality and accuracy may vary.

Issued:
Panasonic Holdings Corporation

Downloads (Images)

Featured news