As the global demand for mobile communication services increases and new application scenarios arise, higher-level requirements for mobile communication capabilities will drive the development of the sixth generation (6G) of the mobile communication system. The applications of 6G will involve high-definition video, virtual reality (VR), augmented reality (AR), holographic projection, and other high-capacity multimodal services, as well as the requirements of intelligence, collaboration, and personalization for communications. 

Semantic communications for developing 6G applications have attracted extensive attention in both industry and academia. However, existing studies have explored semantic communications systems for single-modal data transmission. In contrast, a comprehensive semantic communication framework for multimodal data with richer meanings and complex structure has been missing.

An article in IEEE Network proposes a framework of semantic communications for multimedia data. This framework includes a semantic knowledge base, semantic representation, semantic codecs, and semantic information transmission. Based on the proposed semantic representations, the researchers developed structured codec schemes for images and videos; and examined semantic information transmission with non-orthogonal multiple access (NOMA).

Semantic Representation

The article starts by reviewing past and current key techniques in semantic communications, including Multimedia Semantic Representation; Multimedia Generative Reconstruction; End-to-End Communication Systems, before introducing the proposed framework for semantic communications.

For the proposed framework, according to the authors, the rules, methods, models, and training libraries are deployed at the cloud server and maintained at the edge server to provide prior semantic knowledge for the transmitter and receiver. Based on the semantic knowledge base, the transmitter performs semantic representation and semantic encoding of the multimodal source. The abstracted semantic information from multiple users is transmitted with a multiple-access technique. A generative approach is adopted at the receiver to achieve semantic fidelity to reconstruct high-resolution, stable, and smooth video data.

Proposed semantic communication framework.


The presentation of data in multimedia communications is increasingly diverse, with an explosion of data in different modalities such as text, image, and video. In the article, the authors explore a new model of content-oriented semantic representation by drawing on the cognitive mechanism in the human brain. Specifically, by imitating the human brain's cognitive and information storage properties using neurons. 

An object-attribute-relation (OAR) model is proposed to establish a general knowledge-level representation of data in different spaces, times, and modalities. The OAR model uses objects and attributes as a generic representation of a concept and then specializes the idea in a specific scenario by establishing specific relationships, especially between objects, to form a particular representation of the concept. 

Coding and Decoding

The fundamental problem of multimedia coding and transmission based on the spatiotemporal OAR model is high-definition multimedia decoding with high-level closed-loop optimization using residual information. The researchers combine the ideas of generative and discriminative models in machine learning to build generative and discriminative networks and explore the reconstruction framework based on generative cognitive computing. The framework transmits semantic features related to human brain cognition. It generates the texture and content of the image video via the generator of the self-encoder network to ensure the contextual relationship and complex structure of the image video space.

Framework of semantic encoding and decoding.


NOMA Transmission

Future communication systems are expected to simultaneously support semantic information and bit sequence transmissions to provide intelligent connectivity among different types of devices. As one of the most fundamental MA technologies, NOMA is utilized to realize this challenging heterogeneous transmission with the given radio resources. 

For the NOMA-based heterogeneous semantic and bit transmission scheme, a higher spectrum efficiency could be achieved by transmitting the two streams via the fully shared frequency band, and the semantic stream is delivered in an interference-free manner with successive interference cancellation. By adopting such a design, the framework could support the coexistence of semantic communication and conventional communication systems.


The article outlines simulations based on the proposed semantic communication framework. Initial results verify the efficiency of the developed sketch graph-based codec, the image codec based on multimodal information, and the coexistence of bits and semantic information transmission with non-orthogonal multiple access (NOMA). The authors conclude that the proposed semantic communication system is a promising solution to support multimodal data transmission efficiently.

Interested in acquiring full-text access for your entire organization? Full articles available with purchase or subscription. Contact us to see if your organization qualifies for a free trial.