Meet VALUE!

A Comprehensive Benchmark for Video-And-Language Understanding Evaluation.

Why VALUE?



Multi-channel Video

With both Video Frames and Subtitle/ASR




Diverse Video Domain

Diverse video content from YouTube, TV Episodes and Movie Clips



Various Datasets over Representative Tasks

11 datasets over 3 tasks: Retrieval, Question Answering and Captioning.



Leaderboard!

To track the advances in Video-and-Langauge research.



What is VALUE?


The Video-And-Language Understanding Evaluation (VALUE) benchmark is a collection of resources for training, evaluating, and analyzing systems for understanding both video and subtitles. VALUE consists of:

  • A benchmark of 11 video and language tasks built on established existing datasets and selected to cover a diverse range of dataset sizes, video genres, degrees of difficulty and task types
  • A public leaderboard for tracking performance on the benchmark

The format of the VALUE benchmark is model-agnostic, so any system capable of processing multi-channel video (video+subtitle) + natural langugage sentence pairs and producing corresponding predictions is eligible to participate. The ultimate goal of VALUE is to drive research in the development of general and robust video+language understanding systems.


Paper


Please cite our paper as below if you use the VALUE benchmark or starter code.

@InProceedings{li2021value,
title={VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation},
author={Li, Linjie and Lei, Jie and Gan, Zhe and Yu, Licheng and Chen, Yen-Chun and Pillai, Rohit
        and Cheng, Yu and Zhou, Luowei and Wang, Xin Eric and Wang, William Yang and others},
booktitle = {35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks},
year = {2021}
} 
                        

Contact



Have any questions or suggestions? Feel free to reach us at value-benchmark@googlegroups.com!