Datasets Download

ZeroSpeech 2021

The dataset, the baseline and random submissions are provided here for download. The dataset is released under a Creative Commons 4.0 licence. The baseline checkpoints include CPC checkpoints firrst released in the public domain by Faceboook AI Research.

File Description Size MD5 sum
zerospeech2021-dataset.zip Data for the 2021 edition 24 GB d196d4c9174f1bf2ce7111a19abddaca
zerospeech2021-submission-random.zip Purely random submission provided as exemple 0.2 GB e58b62602f34fddc97a39a3ebf2b21ab
zerospeech2021-submission-baseline-bert.zip Baseline submission (BERT) 13 GB 8544fe3fccb6ead94a6ae1e260240ca8
zerospeech2021-submission-baseline-lstm.zip Baseline submission (LSTM) 17 GB 994d1323b43376e7f03e6cd06e966e60
baseline_checkpoints.tar.gz Baseline checkpoints 2.4 GB 3c5cfeda5dca079f2c0c02b6cbeb08ed

The following commands will download and unzip the dataset:

wget https://download.zerospeech.com/2021/zerospeech2021-dataset.zip
unzip zerospeech2021-dataset.zip -d zerospeech2021_dataset
rm -f zerospeech2021-dataset.zip

ZeroSpeech 2020

The datasets for the ZeroSpeech Challenge 2020 are provided here for download. Please note that the archives are protected by a password that is communicated once you accepted the agreement below.

File Description Size MD5 sum
zerospeech2020.z01 Data for the 2020 edition (1/3) 10.0 GB c9906d9062744cec87f4a4048a0c551b
zerospeech2020.z02 Data for the 2020 edition (2/3) 10.0 GB 7eaa187d403c3aeef94e13f9053ce861
zerospeech2020.zip Data for the 2020 edition (3/3) 6.0 GB 839a18a0dfe11c706428ddc27d87d5b8
baseline.zip Baseline submission 6.8 GB b5934920fcbb0b3af90611185696510b
2017_vads.zip VAD for the 2017 wavs 1.3 M c78b21df917b7de4d952d60492327a29

The following script will download and unzip the datasets

#!/bin/bash

PASSWORD=XXXX_REPLACE_WITH_THE_PASSWORD_XXXX
for ext in zip z01 z02
do
    wget https://download.zerospeech.com/2020/zerospeech2020.$ext || exit 1
done
7z x -p$PASSWORD zerospeech2020.zip || exit 1
rm -f zerospeech2020.z* || exit 1
exit 0


ZeroSpeech 2019

The datasets for the ZeroSpeech Challenge 2019 are provided here for download.

Development dataset

Surprise dataset

ZeroSpeech 2017

The datasets for the ZeroSpeech Challenge 2017 are provided here for download.

Development dataset

Training datasets (Tracks 1 & 2)

Test datasets (Track 1)

Surprise dataset

Training datasets (Tracks 1 & 2)

Test datasets (Track 1)

Dataset VADS