Sakha-TB is a deidentified image dataset of frontal chest X-rays, collected through collaboration with several medical institutions in the Republic of Sakha (Yakutia, Russia). The set contains 400 normal X-rays and 400 X-rays with manifestations of tuberculosis, balanced to some extent by age and gender, in 16-bit and 8-bit PNG format. Please inform us if you find errors or inconsistencies in the data.


We provide the full version of the dataset (DICOM converted to PNG with bit depth preserved) as well as the compressed one (cropped, downsampled, normalized, converted to 8-bit).
Full version (8.73 GB): OneDrive | Yandex Disk
Compressed version (0.27 GB): OneDrive | Yandex Disk


It would be highly appreciated if you can cite our paper when using our dataset:

  • @article{pchelintsev2022robustness,
    author={Pchelintsev, Ya and Khvostikov, A and Buchatskaia, O and Nikiforova, N and Shepeleva, L and Prokopev, E and Parolina, L and Krylov, A},
    title={Robustness Analysis of Chest {X}-Ray Computer Tuberculosis Diagnosis},
    journal={Computational Mathematics and Modeling},

Terms of use

The dataset belongs to the E.N. Andreev "Phthisiatry" Research-Practice Center and the Laboratory of Mathematical Methods of Image Processing at the Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, and is licensed under Creative Commons Attribution 4.0 International license.