No Cover Image

Journal article 34 views 6 downloads

A Multimodal Ensemble-Based Framework for Detecting Fake News Using Visual and Textual Features

Muhammad Abdullah Orcid Logo, Hongying Zan, Arifa Javed Orcid Logo, Muhammad Sohail, Orken Mamyrbayev Orcid Logo, Zhanibek Turysbek Orcid Logo, Hassan Eshkiki Orcid Logo, Fabio Caraffini Orcid Logo

Mathematics, Volume: 14, Issue: 2, Start page: 360

Swansea University Authors: Hassan Eshkiki Orcid Logo, Fabio Caraffini Orcid Logo

  • 71351.VOR.pdf

    PDF | Version of Record

    © 2026 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

    Download (15.79MB)

Check full text

DOI (Published version): 10.3390/math14020360

Abstract

Detecting fake news is essential in natural language processing to verify news authenticity and prevent misinformation-driven social, political, and economic disruptions targeting specific groups. A major challenge in multimodal fake news detection is effectively integrating textual and visual modal...

Full description

Published in: Mathematics
ISSN: 2227-7390
Published: MDPI AG 2026
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa71351
Abstract: Detecting fake news is essential in natural language processing to verify news authenticity and prevent misinformation-driven social, political, and economic disruptions targeting specific groups. A major challenge in multimodal fake news detection is effectively integrating textual and visual modalities, as semantic gaps and contextual variations between images and text complicate alignment, interpretation, and the detection of subtle or blatant inconsistencies. To enhance accuracy in fake news detection, this article introduces an ensemble-based framework that integrates textual and visual data using ViLBERT’s two-stream architecture, incorporates VADER sentiment analysis to detect emotional language, and uses Image–Text Contextual Similarity to identify mismatches between visual and textual elements. These features are processed through the Bi-GRU classifier, Transformer-XL, DistilBERT, and XLNet, combined via a stacked ensemble method with soft voting, culminating in a T5 metaclassifier that predicts the outcome for robustness. Results on the Fakeddit and Weibo benchmarking datasets show that our method outperforms state-of-the-art models, achieving up to 96% and 94% accuracy in fake news detection, respectively. This study highlights the necessity for advanced multimodal fake news detection systems to address the increasing complexity of misinformation and offers a promising solution.
Keywords: fake news detection; NLP; sentiment analysis; transformers; deep learning
College: Faculty of Science and Engineering
Issue: 2
Start Page: 360