Projects

VideoDubber tools

Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech, which requires strict length control.

TeViS Dataset

A storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards however remains challenging which not only requires association between high-level texts and images, but also demands for long-term reasoning to make transitions smooth across shots. We propose a new task called Text synopsis to Video Storyboard (TeViS), which aims to retrieve an ordered sequence of images to visualize the text synopsis.

TikTalk Dataset

TikTalk is a multi-modal Chinese dialogue dataset introduced in TikTalk: A Multi-modal Dialogue Dataset for Real-world Chichat. It contains 38,703 videos and corresponding 367,670 dialogues in Douyin.