Self-Supervised Multimodal Opinion Summarization

1 minute read

Published:

Opinion summarization is the task of automatically generating summaries from multiple documents containing users’ thoughts on businesses or products. This summarization of users’ opinions can provide information that helps other users with their decision-making on consumption.

  • Paper Link : https://aclanthology.org/2021.acl-long.33.pdf
  • Code : https://github.com/nc-ai/knowledge/tree/master/publications/MultimodalSum
  • Dataset : Yelp and Amazon Dataset
  • Model : MultiModalSum : Multi modal encoder and decoder structure
  • Pretrained models : BART, ResNet101 and Perceptrons

Contributions

The contributions can be summarized as follows:

  • This study is the first work on self-supervised multimodal opinion summarization
  • Proposed a multimodal training pipeline to resolve the heterogeneity between input modalities
  • Verified the effectiveness of the model framework and model training pipeline through various experiments on Yelp and Amazon datasets.

Summary

MultimodalSum, is designed with an encoder–decoder structure. To address the heterogeneity of three input modalities, each modality encoder is configured to effectively process data in each modality. A text decoder is set to generate summary text by synthesizing encoded representations from the three modality encoders. The experiments are done on Yelp and Amazon dataset. The baselines which are compared are :

  • Clustroid
  • Lead
  • Random
  • LexRank
  • Opinosis
  • MeanSum
  • DenoiseSum
  • CopyCat
  • Self & Control

The metrics used are :

  • ROUGE-{1,2,L}
  • BERT Score
  • Human Evaluation -> Best-Worst Scaling