- Français: Cette page n'est pas encore traduite en français.
Create Your First Dataset
In labelit datasets contain documents. Each document has some optional metadata, an optional text file, and an optional audio recording.
A document may contain both text and audio, and must contain at least one audio file or one text file.
Importing the provided example dataset
To help you get started, we've created an example dataset. It's a zip file located at
backend/src/labelit/tests/data/example_dataset.zip relative to the project root folder.
If you inspect this dataset, you will have an idea of the required format for import.
example_dataset/
├── documents
│ ├── file1.meta.json
│ ├── file1.txt
│ ├── file1.wav
│ ├── file2.meta.json
│ ├── file2.txt
│ └── file2.wav
└── meta.json
Notice the meta.json file at the root of the dataset. This file must contain a name and may contain any other
metadata you'd like to associate with this dataset.
To import this dataset into labelit, login to labelit as a QA user and navigate to the "Datasets" page.
There, click on "Import dataset" and follow the steps to upload example_dataset.zip. In your list of datasets, "Dummy dataset" should now appear.
Importing your own dataset
To import your own dataset, follow closely the format of example_dataset :
-
At the root, there must be a file named
meta.json, containing at the mimimum the name of the dataset{ "name": "the name of my own dataset" } -
At the root, there must be a
documentsfolder :-
For each document, this folder may contain a metadata file named
<document_identifier>.meta.json -
For each document, there may be an audio (wave) file named
<document_identifier>.wav -
For each document, there may be an text file named
<document_identifier>.txtcontaining the associated text
-