The Dataset consists of a series of histological images of colorectal cancer. Colorectal cancer is one of the most common cancers. It is important to establish an appropriate tissue classification mechanism associated with this pathology. The presence of several classes helps to improve the treatment of patients, since most datasets separate only between disease/non-disease.
The dataset is composed of 5000 RGB samples, with dimensions 150*150 (height and width). And the main objective of the problem is to correctly classify the tissue type of colorectal cancer. The 8 classes are:
- Tumor;
- Stroma;
- Complex;
- Lympho;
- Debris;
- Mucosa;
- Adipose;
- Empty;
The major limitation of this dataset is the low number of samples available. The dataset is balanced and no sampling technique is required and used.
- Disponibilization of a Jupyter notebook with problem pre-analysis;
- The Data Augmentation technique is used to allow the consequent increase in the number of training samples available for model learning;
- It implements and uses four convolutional architectures for the consequent resolution of the problem: AlexNet, VGGNet, ResNet and DenseNet;
- Use of PSO algorithm to optimize the structure and other hyperparameters of different convolutional architectures;
- Application of the ensemble technique to improve the performance obtained, individually, by the architectures (combining the probabilistic distributions of the different architectures - average);
Model | Memory | Macro Average F1Score | Accuracy | File |
---|---|---|---|---|
AlexNet | 19,0 MB | 94.2% | 94.3% | AlexNet h5 File |
VGGNet | 15,5 MB | 94.5% | 94.6% | VGGNet h5 File |
ResNet | 11,4 MB | 95.5% | 95.7% | ResNet h5 File |
DenseNet | 17,9 MB | 96.0% | 96.1% | DenseNet h5 File |
Ensemble Average All Models | 21,4 MB | 95.5% | 95.6% | Ensemble All Models h5 File |
Ensemble Average Res+ Dense | 9,9 MB | 96.6% | 96.6% | Ensemble Best Combination h5 File |
- Clone Project: git clone https://github.com/bundasmanu/Colorectal_Histopathology.git;
- Install requirements: pip install -r requirements.txt;
- Check config.py file, and redraw the configuration variables used to read, obtain and divide the data of the problem, and variables that are used for construction, training and optimization of the architectures:
- Samples of problem are readed from ../input/images/LESION_NAME/*.tif, e.g, ../input/images/STROMA/image1.tif --> this is an example that you need to pay attention and redraw before use project;
https://www.kaggle.com/kmader/colorectal-histology-mnist
GPL-3.0 License
I am open to new ideas and improvements to the current repository. However, until the defense of my master thesis, I will not accept pull request's.