Data
"The ATC LID/ASR evaluation dataset is going to be published at Interspeech 2021. Stay tuned!"
Abstract: Detecting English Speech in the Air Traffic Control Voice Communication |
ASR dataset (V1).
Name: ATCO2-ASRdataset-v1_beta
Description: This dataset was build for development and evaluation of automatic speech recognizer techniques for English ATC data. Note: The dataset is considered as beta version and will be updated in the near future (some transcript fine tuning may happen). The dataset consists of English coming from LKTB, LKPR, LZIB, LSGS, LSZH, LSZB and YSSY airports The length of audio is 1.10 hours in total. We provided audio (wav format), English automatic transcript generated by an ASR and info file with meta information and nearby callsigns.
Link to file to download: https://www.replaywell.com/atco2/download/ATCO2-ASRdataset-v1_beta.tgz
LID dataset (V1).
Name: ATCO2-LIDdataset-v1_beta
Description: This dataset was build for development and evaluation of techniques for English and non-English speech classification of ATC data. Note: The dataset is considered as beta version and will be updated in the future (more language pairs will be add and some cleaning/debugging may happen). The dataset consists of language pairs:
CZEN - devel (6.11 hours),
CZEN - eval (6.21 hours)
FREN - devel (2.68 hours),
FREN - eval (3.27 hours),
GEEN - devel English only (5.61 hours),
GEEN - eval (2.41 hours),
EN-AU (Australian English) - eval English only (0.17 hours).
Where possible we split the pair to development and evaluation subsets. We provided audio (wav format), English automatic transcript generated by an ASR and info file with estimated SNR, language and length.
Link to file to download: https://www.replaywell.com/atco2/download/ATCO2-LIDdataset-v1_beta.tgz
DATASETS COLLECTED IN ATCO2. The data have been collected from several airports (data sizes are in hours)
LKPR: Prague (Czech Republic)
LKTB: Brno (Czech Republic)
LSZH: Zurich (Switzerland)
LSZB: Bern (Switzerland)
LSGS: Sion (Switzerland)
YSSY: Sydney (Australia)
LZIB: Bratislava (Slovakia)
EETN: Tallinn (Estonia)
Total
All: 1517.171319
LKPR: 590.519025
LKTB: 341.325012
LSZH: 287.376496
LSZB: 138.460281
LSGS: 68.159527
YSSY: 65.163918
LZIB: 22.253698
EETN: 3.913363
Date | LKPR | LKTB | LSZH | LSZB | LSGS | YSSY | LZIB | EETN | All |
---|---|---|---|---|---|---|---|---|---|
10/2020 | 17.4759 | 50.6165 | 68.0924 | ||||||
11/2020 | 58.4633 | 36.348 | 94.8113 | ||||||
12/2020 | 69.2494 | 21.8845 | 91.1339 | ||||||
01/2021 | 37.4121 | 25.2206 | 62.6328 | ||||||
02/2021 | 18.059968 | 25.777703 | 43.837672 | ||||||
03/2021 | 25.916452 | 24.702203 | 50.618656 | ||||||
04/2021 | 57.559116 | 33.174521 | 48.028701 | 34.070596 | 6.478430 | 4.716394 | 3.958835 | 1.310111 | 189.296704 |
05/2021 | 35.659576 | 39.189885 | 78.005412 | 42.569556 | 15.479776 | 30.957518 | 10.953263 | 0.747391 | 253.562378 |
06/2021 | 109.098188 | 34.528389 | 88.936661 | 57.455538 | 1.121517 | 29.490006 | 7.341600 | 0.408442 | 328.380339 |
07/2021 | 163.637212 | 47.870546 | 72.405722 | 4.364591 | 45.079805 | 1.447419 | 334.805295 |
DATASET ANNOTATED IN ATCO2. The data have been collected from several airports:
Total of annotated English speech: 137 minutes (roughly 2h17)
Date | EN | CZ | LSGS_EN | LSGS_FR | LSZB_EN | LSZB_GE | LSZH_EN | LSZH_GE | LZIB_EN | YSSY_EN | Total_EN |
---|---|---|---|---|---|---|---|---|---|---|---|
9.6.2021 | 14.83 | 2.71 | 2.96 | 0.78 | 9.64 | 0 | 6.52 | 0.38 | 0 | 33.95 | |
23.6.2021 | 14.83 | 2.71 | 13.22 | 0.78 | 14.67 | 0 | 9.19 | 0.46 | 1.3 | 1.85 | 55.06 |
19.7.2021 | 14.83 | 2.71 | 21.29 | 2 | 19.66 | 1.98 | 11.56 | 0.46 | 6.99 | 5.58 | 19.91 |
30.7.2021 | 18.31 | 2.71 | 26.82 | 2 | 22.81 | 1.98 | 16.74 | 0.46 | 12.05 | 8.9 | 105.63 |
19.8.2021 | 18.31 | 2.71 | 44.5 | 2 | 31.74 | 1.98 | 16.74 | 0.46 | 12.05 | 14.08 | 137.42 |