Media Summary: In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms ... Interested in phrase localization? Captioning? Detection? Grounding? Join us and learn the latest on the 30s teaser for my talk on the AVA-Kinetics challenge. The full

Activitynet A Large Scale Video - Detailed Analysis & Overview

In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms ... Interested in phrase localization? Captioning? Detection? Grounding? Join us and learn the latest on the 30s teaser for my talk on the AVA-Kinetics challenge. The full ICCV 2025 Abstract We propose a novel approach for captioning and object grounding in The 18th European Conference on Computer Vision ECCV 2024 Training-free Thank you for joining! We wish you a fruitful CVPR 2020.

Come learn about the cutting edge of action recognition in his challenge evaluates the ability of vision algorithms to understand complex related events in a

Photo Gallery

ActivityNet A Large-Scale Video Benchmark for Human Activity Understanding
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
Action100M: A Large-scale Video Action Dataset (Jan 2026)
ActivityNet Event Dense-Captioning
ActivityNet Dense Event Captioning Results
ActivityNet Entities Results
406 - MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection
AVA-Kinetics Challenge - ActivityNet 2020 - 30s teaser
Large-scale Pre-training for Grounded Video Caption Generation
[ECCV'24] Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Closing Remarks
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities @CVPR'22
Sponsored
Sponsored
View Detailed Profile
Sponsored
Sponsored