Exploring multimodal deep learning for skin-cancer detection by combining image and clinical data through fusion-based models.
A ResNet-18 CNN trained on dermoscopic images to identify visual lesion patterns such as texture and color. Leveraging transfer learning from ImageNet, this model learns distinct spatial cues from skin imagery.
A Multi-Layer Perceptron built on patient metadata (age, lesion site, and color irregularity).
It highlights how structured clinical data alone can effectively predict lesion types
Fusion Model
The fusion-based approach combines image and metadata features using both early fusion and late fusion techniques:
Early Fusion: The model is designed to handle both image and metadata inputs as one.
Late Fusion: The Late Fusion architecture integrated a Multilayer Perceptron (MLP) for processing metadata and a ResNet-18 backbone for image feature extraction. Image and metadata were separately processed before merging their embeddings for final classification.
Conclusion
The Late Fusion model outperformed all others, showing that combining independent feature representations leads to better generalization. Interestingly, metadata alone proved highly predictive, emphasizing its diagnostic value.
SEE ALSO