No Cover Image

Journal article 232 views 86 downloads

Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk

Mohammad Abedin Orcid Logo, Chi Guotai, Petr Hajek, Tong Zhang

Complex and Intelligent Systems, Volume: 9, Issue: 4, Pages: 3559 - 3579

Swansea University Author: Mohammad Abedin Orcid Logo

  • 64260.VOR.pdf

    PDF | Version of Record

    © The Author(s) 2021. Distributed under the terms of a Creative Commons Attribution 4.0 License (CC BY 4.0).

    Download (1.23MB)

Abstract

In small business credit risk assessment, the default and nondefault classes are highly imbalanced. To overcome this problem, this study proposes an extended ensemble approach rooted in the weighted synthetic minority oversampling technique (WSMOTE), which is called WSMOTE-ensemble. The proposed ens...

Full description

Published in: Complex and Intelligent Systems
ISSN: 2199-4536 2198-6053
Published: Springer Science and Business Media LLC 2023
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa64260
first_indexed 2023-09-18T16:27:22Z
last_indexed 2024-11-25T14:13:44Z
id cronfa64260
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2023-09-20T10:27:13.2855708</datestamp><bib-version>v2</bib-version><id>64260</id><entry>2023-08-31</entry><title>Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk</title><swanseaauthors><author><sid>4ed8c020eae0c9bec4f5d9495d86d415</sid><ORCID>0000-0002-4688-0619</ORCID><firstname>Mohammad</firstname><surname>Abedin</surname><name>Mohammad Abedin</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-08-31</date><deptcode>CBAE</deptcode><abstract>In small business credit risk assessment, the default and nondefault classes are highly imbalanced. To overcome this problem, this study proposes an extended ensemble approach rooted in the weighted synthetic minority oversampling technique (WSMOTE), which is called WSMOTE-ensemble. The proposed ensemble classifier hybridizes WSMOTE and Bagging with sampling composite mixtures to guarantee the robustness and variability of the generated synthetic instances and, thus, minimize the small business class-skewed constraints linked to default and nondefault instances. The original small business dataset used in this study was taken from 3111 records from a Chinese commercial bank. By implementing a thorough experimental study of extensively skewed data-modeling scenarios, a multilevel experimental setting was established for a rare event domain. Based on the proper evaluation measures, this study proposes that the random forest classifier used in the WSMOTE-ensemble model provides a good trade-off between the performance on default class and that of nondefault class. The ensemble solution improved the accuracy of the minority class by 15.16% in comparison with its competitors. This study also shows that sampling methods outperform nonsampling algorithms. With these contributions, this study fills a noteworthy knowledge gap and adds several unique insights regarding the prediction of small business credit risk.</abstract><type>Journal Article</type><journal>Complex and Intelligent Systems</journal><volume>9</volume><journalNumber>4</journalNumber><paginationStart>3559</paginationStart><paginationEnd>3579</paginationEnd><publisher>Springer Science and Business Media LLC</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>2199-4536</issnPrint><issnElectronic>2198-6053</issnElectronic><keywords>Small business, Credit risk, Imbalanced data, Oversampling, Weighted SMOTE, Ensemble learning</keywords><publishedDay>31</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-08-31</publishedDate><doi>10.1007/s40747-021-00614-4</doi><url>http://dx.doi.org/10.1007/s40747-021-00614-4</url><notes/><college>COLLEGE NANME</college><department>Management School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>CBAE</DepartmentCode><institution>Swansea University</institution><apcterm/><funders>This work has been supported by the Key Projects of National Natural Science Foundation of China (71731003 and 71431002), the General Projects of National Natural Science Foundation of China (71471027 and 71873103), the National Social Science Foundation of China (16BTJ017), the Youth Project of National Natural Science Foundation of China (71601041), the scientific research project of the Czech Sciences Foundation Grant (19-15498S), the Aderi Intelligent Technology (Xiamen) Co and Bank of Dalian as well as Postal Savings Bank of China.</funders><projectreference/><lastEdited>2023-09-20T10:27:13.2855708</lastEdited><Created>2023-08-31T17:58:52.9308496</Created><path><level id="1">Faculty of Humanities and Social Sciences</level><level id="2">School of Management - Accounting and Finance</level></path><authors><author><firstname>Mohammad</firstname><surname>Abedin</surname><orcid>0000-0002-4688-0619</orcid><order>1</order></author><author><firstname>Chi</firstname><surname>Guotai</surname><order>2</order></author><author><firstname>Petr</firstname><surname>Hajek</surname><order>3</order></author><author><firstname>Tong</firstname><surname>Zhang</surname><order>4</order></author></authors><documents><document><filename>64260__28567__561dc4f2b9e64680b0ef89b878321b14.pdf</filename><originalFilename>64260.VOR.pdf</originalFilename><uploaded>2023-09-18T17:26:49.7376340</uploaded><type>Output</type><contentLength>1291323</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>&#xA9; The Author(s) 2021. Distributed under the terms of a Creative Commons Attribution 4.0 License (CC BY 4.0).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2023-09-20T10:27:13.2855708 v2 64260 2023-08-31 Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk 4ed8c020eae0c9bec4f5d9495d86d415 0000-0002-4688-0619 Mohammad Abedin Mohammad Abedin true false 2023-08-31 CBAE In small business credit risk assessment, the default and nondefault classes are highly imbalanced. To overcome this problem, this study proposes an extended ensemble approach rooted in the weighted synthetic minority oversampling technique (WSMOTE), which is called WSMOTE-ensemble. The proposed ensemble classifier hybridizes WSMOTE and Bagging with sampling composite mixtures to guarantee the robustness and variability of the generated synthetic instances and, thus, minimize the small business class-skewed constraints linked to default and nondefault instances. The original small business dataset used in this study was taken from 3111 records from a Chinese commercial bank. By implementing a thorough experimental study of extensively skewed data-modeling scenarios, a multilevel experimental setting was established for a rare event domain. Based on the proper evaluation measures, this study proposes that the random forest classifier used in the WSMOTE-ensemble model provides a good trade-off between the performance on default class and that of nondefault class. The ensemble solution improved the accuracy of the minority class by 15.16% in comparison with its competitors. This study also shows that sampling methods outperform nonsampling algorithms. With these contributions, this study fills a noteworthy knowledge gap and adds several unique insights regarding the prediction of small business credit risk. Journal Article Complex and Intelligent Systems 9 4 3559 3579 Springer Science and Business Media LLC 2199-4536 2198-6053 Small business, Credit risk, Imbalanced data, Oversampling, Weighted SMOTE, Ensemble learning 31 8 2023 2023-08-31 10.1007/s40747-021-00614-4 http://dx.doi.org/10.1007/s40747-021-00614-4 COLLEGE NANME Management School COLLEGE CODE CBAE Swansea University This work has been supported by the Key Projects of National Natural Science Foundation of China (71731003 and 71431002), the General Projects of National Natural Science Foundation of China (71471027 and 71873103), the National Social Science Foundation of China (16BTJ017), the Youth Project of National Natural Science Foundation of China (71601041), the scientific research project of the Czech Sciences Foundation Grant (19-15498S), the Aderi Intelligent Technology (Xiamen) Co and Bank of Dalian as well as Postal Savings Bank of China. 2023-09-20T10:27:13.2855708 2023-08-31T17:58:52.9308496 Faculty of Humanities and Social Sciences School of Management - Accounting and Finance Mohammad Abedin 0000-0002-4688-0619 1 Chi Guotai 2 Petr Hajek 3 Tong Zhang 4 64260__28567__561dc4f2b9e64680b0ef89b878321b14.pdf 64260.VOR.pdf 2023-09-18T17:26:49.7376340 Output 1291323 application/pdf Version of Record true © The Author(s) 2021. Distributed under the terms of a Creative Commons Attribution 4.0 License (CC BY 4.0). true eng https://creativecommons.org/licenses/by/4.0/
title Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk
spellingShingle Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk
Mohammad Abedin
title_short Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk
title_full Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk
title_fullStr Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk
title_full_unstemmed Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk
title_sort Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk
author_id_str_mv 4ed8c020eae0c9bec4f5d9495d86d415
author_id_fullname_str_mv 4ed8c020eae0c9bec4f5d9495d86d415_***_Mohammad Abedin
author Mohammad Abedin
author2 Mohammad Abedin
Chi Guotai
Petr Hajek
Tong Zhang
format Journal article
container_title Complex and Intelligent Systems
container_volume 9
container_issue 4
container_start_page 3559
publishDate 2023
institution Swansea University
issn 2199-4536
2198-6053
doi_str_mv 10.1007/s40747-021-00614-4
publisher Springer Science and Business Media LLC
college_str Faculty of Humanities and Social Sciences
hierarchytype
hierarchy_top_id facultyofhumanitiesandsocialsciences
hierarchy_top_title Faculty of Humanities and Social Sciences
hierarchy_parent_id facultyofhumanitiesandsocialsciences
hierarchy_parent_title Faculty of Humanities and Social Sciences
department_str School of Management - Accounting and Finance{{{_:::_}}}Faculty of Humanities and Social Sciences{{{_:::_}}}School of Management - Accounting and Finance
url http://dx.doi.org/10.1007/s40747-021-00614-4
document_store_str 1
active_str 0
description In small business credit risk assessment, the default and nondefault classes are highly imbalanced. To overcome this problem, this study proposes an extended ensemble approach rooted in the weighted synthetic minority oversampling technique (WSMOTE), which is called WSMOTE-ensemble. The proposed ensemble classifier hybridizes WSMOTE and Bagging with sampling composite mixtures to guarantee the robustness and variability of the generated synthetic instances and, thus, minimize the small business class-skewed constraints linked to default and nondefault instances. The original small business dataset used in this study was taken from 3111 records from a Chinese commercial bank. By implementing a thorough experimental study of extensively skewed data-modeling scenarios, a multilevel experimental setting was established for a rare event domain. Based on the proper evaluation measures, this study proposes that the random forest classifier used in the WSMOTE-ensemble model provides a good trade-off between the performance on default class and that of nondefault class. The ensemble solution improved the accuracy of the minority class by 15.16% in comparison with its competitors. This study also shows that sampling methods outperform nonsampling algorithms. With these contributions, this study fills a noteworthy knowledge gap and adds several unique insights regarding the prediction of small business credit risk.
published_date 2023-08-31T05:28:30Z
_version_ 1821382080856064000
score 11.3749895