Phenaki

Phenaki

A model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes.

Monthly visits:19182
Visit
Phenaki screenshot

Description:

Phenaki is a model capable of generating realistic videos from a sequence of textual prompts. It is particularly challenging to generate videos from text due to the computational cost, limited quantities of high-quality text-video data, and variable length of videos. To address these issues, Phenaki introduces a new causal model for learning video representation, which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text, Phenaki uses a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, Phenaki demonstrates how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to previous video generation methods, Phenaki can generate arbitrarily long videos conditioned on a sequence of prompts (i.e., time-variable text or a story) in an open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time-variable prompts. In addition, the proposed video encoder-decoder outperforms all per-frame baselines currently used in the literature in terms of spatio-temporal quality and the number of tokens per video.

For Tasks:

For Jobs:

Features

Advantages

  • Can generate arbitrarily long videos
  • Can be conditioned on a sequence of prompts (i.e., time-variable text or a story)
  • Outperforms all per-frame baselines currently used in the literature in terms of spatio-temporal quality and the number of tokens per video
  • Can be used to generate videos for a variety of applications, such as entertainment, education, and marketing

Disadvantages

  • Can be computationally expensive to train
  • Requires a large amount of data to train well
  • Can be difficult to control the style and quality of the generated videos

Frequently Asked Questions

Alternative AI tools for Phenaki

Similar sites

For similar tasks

For similar jobs