Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 minute read

Published: March 24, 2021

Conference

ECCV 2020

Authors

Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra

Contributions

Developed VLN-BERT, a visiolinguistic transformer-based model for scoring path-instruction pairs. Shows that VLN-BERT outperforms strong single model baselines from prior work on the path selection task - increasing success rate (SR) by 4.6 absolute percentage points.
Demonstrated that in an ensemble of diverse models VLN-BERT improves SR by 3.0 absolute percentage points on “unseen” validation, leading to a SR of 73% on the VLN leaderboard.
Ablated the proposed training curriculum, and find that each stage contributes significantly to the final outcome, with a cumulative benefit that is greater than the sum of the individual effects.
Provides qualitative evidence that our model learns to ground object references. Specifically, using gradient-based methods visualizes how image-region importance shifts under modifications to the instructions given to our model, demonstrating reasonable responses to these interventions. For example, if we modify the instruction ‘Walk down the stairs, then stop next to the fridge. ’by removing‘stop next to the fridge’ we observe that image regions containing the fridge become less important

Interesting Concepts

BERT
VIL-BERT
VLN-BERT
Leveraging transfer learning
training methods

Approach

Vision-and-Language Navigation as Path Selection
Modeling Instruction-Path Compatibility

Architecture

Dataset

Matterport3D

Metrics

Success rate (SR)
Oracle Success rate (SR)
Navigation error (NE)
Path length (PL)
Success rate weighted by path length (SPL)

Share on

Twitter Facebook Google+ LinkedIn

You May Also Enjoy

GSOC 2017 - Week 4 of GSoC 17

1 minute read

Published: June 30, 2017

This blog is dedicated to the third week of Google Summer of Code (i.e June 24 - July 1). This week was concentrated on cross-testing and analysis of the API with some challenging tests.

GSOC 2017 - Week 3 of GSoC 17

3 minute read

Published: June 23, 2017

This blog is dedicated to the third week of Google Summer of Code (i.e June 16 - June 23). But first, a brief insight is given about how the code works.

GSOC 2017 - Week 2 of GSoC 17

2 minute read

Published: June 15, 2017

This blog is dedicated to the second week of Google Summer of Code (i.e June 8 - June 15). The target of the second week according to my timeline was to implement the Jacobian and gradient using numdifftools.

GSOC 2017 - Week 1 of GSoC 17

1 minute read

Published: June 07, 2017

This blog is dedicated to the first week of Google Summer of Code (i.e June 1 - June 7). The target of the first week according to my timeline was to get conversant with the code structure and implement the derivative using statsmodels and partly by numdifftools.