Multi-Modal Supplementary-Complementary Summarization using Multi-Objective Optimization

2 minute read

Published:

When dealing with multi-modal information retrieval tasks, the extent to which a particular modality contributes to the final output might differ from other modalities. Amongst the modalities, there is often a preferable mode of representation based on the significance and ability to fulfill the task. We denote these preferred modalities as key modalities or central modalities (will be referred to as central modalities from here onwards). The other modalities help assist the central modalities in fulfilling the desired task, and are known as adjacent modalities. The adjacent modalities can enhance the user experience by either supplementing or by complementing the information represented via the central modality. When these adjacent modalities reinforce the facts and ideas presented in central modality, the enhancement is known as supplementary enhancement. On the other hand, when these adjacent modalities complete the central modality, by providing additional or alternate information that is relevant, albeit not covered by the central modality, the enhancement is known as complementary enhancement.

  • Paper Link : https://dl.acm.org/doi/pdf/10.1145/3404835.3462877
  • Code : https://github.com/anubhav-jangra/CCS-MMS-dataset
  • Dataset : Multi Modal Dataset
  • Model : GCTF and VETS
  • Pretrained models : ImageNet

Contributions

The contributions can be summarized as follows:

  • The first to introduce and formally define the concepts of complementary and supplementary enhanced summaries containing text, images and videos.
  • Introduced a novel problem of combined complementary and supplementary multi-modal summarization (CCS-MMS), which takes text, images and videos as input, and outputs text, complementary and supplementary images, as well as supplementary videos as summary.
  • Created an extension of a multi-modal summarization dataset by augmenting the output summary with supplementary and complementary images and videos.
  • Proposed a novel multi-objective optimization (MOO) framework to solve the CCS-MMS task. The framework is kept generic and any MOO technique can be used as the underlying optimization strategy. Illustrated the strength of this framework by using the Grey Wolf Optimizer

Summary

A multi-objective optimization based technique to tackle the CCS-MMS problem is proposed. Divided the proposed model into two sections :

  • Global Coverage Text Format (GCTF) and
  • Visual Enhancement of Text Summaries (VETS). For GCTF, used a Grey Wolf Optimizer to create multiple text summaries by optimizing multiple objectives. These text summaries are then visually enhanced before the post-processing step, where some images and a video are selected based on these enhancements. The experiments are done on Multi Modal Dataset with generative model techniques. The baselines which are compared are :
  • Sim-VETS
  • VE-TextRank
  • Sal-Red UMS
  • Sal-Red MMS,
  • Sal-Red-Het MMS

The metrics used are :

  • ROUGE-R1