# TVTS

**Repository Path**: blueshh/TVTS

## Basic Information

- **Project Name**: TVTS
- **Description**: 利用了自动语音识别 (ASR) 技术，将视频中的声音转换为文本。然后，这些文本可以被整理、搜索和分析，以提取有用的信息。
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-11-23
- **Last Updated**: 2023-11-23

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Turning to Video for Transcript Sorting

This repo contains the official implementations of the two papers:

1. [Learning Transferable Spatiotemporal Representations from Natural Script Knowledge](https://arxiv.org/abs/2209.15280)
2. [TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale](https://arxiv.org/abs/2305.14173)

## News

+ **[2023.02]** 🎉 TVTS is accepted to CVPR 2023.
+ **[2023.03]** The official code of TVTS has been released.
+ **[2023.05]** 🚀 TVTSv2 is comming out! Please refer to [this link](https://arxiv.org/abs/2305.14173) for details. 
+ **[2023.08]** The official code of TVTSv2 and the pre-trained models have been released. All zero-shot evaluations are available on a single GPU. We provide scripts for extracting your own video features. Try it now 😎!

## Introduction

### Quickstart

Folder [v1](v1) contains the official code of TVTS. See [v1-README](v1/README.md) for details.

Folder [v2](v2) contains the official code of TVTSv2, an upgraded version of TVTS that produces powerful video representations for *out-of-the-box* usage. See [v2-README](v2/README.md) for details.

### Citation

If you find our work helps, please cite our paper.

```tex
@InProceedings{Zeng_2023_CVPR,
    author    = {Zeng, Ziyun and Ge, Yuying and Liu, Xihui and Chen, Bin and Luo, Ping and Xia, Shu-Tao and Ge, Yixiao},
    title     = {Learning Transferable Spatiotemporal Representations From Natural Script Knowledge},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {23079-23089}
}
```

```tex
@misc{zeng2023tvtsv2,
      title={TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale}, 
      author={Ziyun Zeng and Yixiao Ge and Zhan Tong and Xihui Liu and Shu-Tao Xia and Ying Shan},
      year={2023},
      eprint={2305.14173},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```