# Skin-Cancer-Detection **Repository Path**: tianlangcz/Skin-Cancer-Detection ## Basic Information - **Project Name**: Skin-Cancer-Detection - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-15 - **Last Updated**: 2026-02-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 🩺 Skin Cancer Detection (HAM10000 Dataset) The HAM10000 dataset is a comprehensive collection of dermatoscopic images for skin lesion classification, widely used in the field of medical imaging and machine learning. It contains a diverse range of skin lesions, aimed at advancing research in dermatology, particularly for the diagnosis of skin cancers. The dataset consists of 10,000 high-resolution images of skin lesions, sourced from various individuals. This diversity helps in training robust machine learning models that can generalize well to unseen data. The primary challenge with the dataset is its significant **_imbalance_**.
Collected images are annotated and categorized into 7 classes including: - Melanoma: 3 - Melanocytic Nevi: 1 - Basal Cell Carcinoma: 5 - Squamous Cell Carcinoma: 2 - Actinic Keratosis: 6 - Vascular Lesions: 4 - Benign Keratosis (Seborrheic Keratosis, etc.): 0 Source: Download the HAM1000 dataset from [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T) # 💡 Class distribution In any machine learning endeavor, it is highly recommended to perform exploratory analysis before commencing the modeling phase. This process yields insights about the data, providing valuable information that can guide and improve the modeling efforts. For example, examining the distribution of categories within the HAM10000 dataset shows that it is imbalanced, necessitating careful strategies to address this issue.