FaceBook 发布星际争霸最大 AI 数据集

简介: 简介我们刚发布了最大的星际争霸:Brood War 重播数据集,有 65646 个游戏。完整的数据集经过压缩之后有 365 GB,1535 million 帧,和 496 million 操作动作。


我们刚发布了最大的星际争霸:Brood War 重播数据集,有 65646 个游戏。完整的数据集经过压缩之后有 365 GB,1535 million 帧,和 496 million 操作动作。


We release the largest StarCraft: Brood War replay dataset yet, with 65646 games. The full dataset after compression is 365 GB, 1535 million frames, and 496 million player actions. The entire frame data was dumped out at 8 frames per second. We made a big effort to ensure this dataset is clean and has mostly high quality replays. You can access it with TorchCraft in C++, Python, and Lua. The replays are in an AWS S3 bucket at s3://stardata. Read below for more details, or our whitepaper on arXiv for more details.

Installing TorchCraft

Note: The current set of replays are only compatible with the 1.3.0 version of torchcraft included here.

Simply do

git submodule update --init
cd TorchCraft
pip install .

More documentation can be found at https://github.com/TorchCraft/TorchCraft. Realistically, you will only need the replayer modules, which means you can ignore most of the connecting to starcraft parts. Check out the code to document its use
- For python
- For C++: replayer.h, frame.h
- For Lua: replayer, and frame

Downloading the Data

You can find the replays in an AWS S3 bucket at s3://stardata
- s3://stardata/dumped_replays contains the replays in a format readable by TorchCraft
- s3://stardata/battles are text files, containing one battle each. Each battle is 3 lines:
- xmin, xmax, ymin, ymax, tmin, tmax: the bounding rectangle for the battle. Multiply time by 3 to get real frame count, or don’t to index directly into the dumped datasets.
- Type and number of units on team 1
- Type and number of units on team 2
- s3://stardata/original_replays.tar contains the original replays.

Reproducing Results

Some of the reproduction scripts are included, others scripts will be added as
soon as we clean up the code and make it easy to install/run. Simply make and
you’re good to go. All cpp files can be run like script /path/to/replays/**/*.rep

  • extract_stats tells you some stats about the replays
  • extract_units preprocesses for battle clustering
  • get_corrupt_replays tells you what replays are considered corrupt
  • cluster.py can be run on the output of extract_units to do battle clustering.


The white paper for the dataset is at:

Lin, Z., G., Jonas, K., Vasil, Synnaeve, G., AIIDE 2017. STARDATA: A StarCraft AI Research Dataset (arxiv)

We attribute most of the replays to bwrep and G. Synnaeve, P. Bessiere, A Dataset for StarCraft AI & an Example of Armies Clustering, 2012.
Please see the paper for a complete list of references.


StarData is BSD-licensed. We also provide an additional patent grant.


机器学习/深度学习 人工智能 数据挖掘
数据上新 | AI Earth上线长时序土地覆盖数据集(来自武汉大学黄昕教授团队)
数据上新 | AI Earth上线长时序土地覆盖数据集(来自武汉大学黄昕教授团队)
数据上新 | AI Earth上线长时序土地覆盖数据集(来自武汉大学黄昕教授团队)
存储 人工智能 数据可视化
人工智能 数据处理 计算机视觉
人工智能 开发工具 Swift
人工智能 vr&ar
[译][AI Research] AI 模型中的“it”是数据集
人工智能 监控 数据处理
【AI大模型应用开发】【LangSmith: 生产级AI应用维护平台】1. 快速上手数据集与测试评估过程
【AI大模型应用开发】【LangSmith: 生产级AI应用维护平台】1. 快速上手数据集与测试评估过程
145 0
人工智能 算法 安全
251 3
为探索提升智能体任务之间的促进及泛化效果,智谱AI&清华KEG提出了一种对齐Agent能力的微调方法 AgentTuning,该方法使用少量数据微调已有模型,显著激发了模型的 Agent能力,同时可以保持模型原有的通用能力。
机器学习/深度学习 人工智能 自然语言处理
KDD 2023 | GPT时代医学AI新赛道:16万张图片、70万问答对的临床问答数据集MIMIC-Diff-VQA发布
KDD 2023 | GPT时代医学AI新赛道:16万张图片、70万问答对的临床问答数据集MIMIC-Diff-VQA发布
264 0
人工智能 算法 JavaScript
既Facebook道歉AI误将黑人标记为灵长类动物后,推荐GitHub 上 7 个 yyds 算法项目
既Facebook道歉AI误将黑人标记为灵长类动物后,推荐GitHub 上 7 个 yyds 算法项目
138 1
既Facebook道歉AI误将黑人标记为灵长类动物后,推荐GitHub 上 7 个 yyds 算法项目

