A Credit Scoring Model Based on Integrated Mixed Sampling and Ensemble Feature Selection: RBR_XGB,ERICDATA高等教育知識庫
高等教育出版
熱門: 朱丽彬  崔雪娟  黃光男  王善边  王美玲  黃乃熒  
高等教育出版
首頁 臺灣期刊   學校系所   學協會   民間出版   大陸/海外期刊   政府機關   學校系所   學協會   民間出版   DOI註冊服務
篇名
A Credit Scoring Model Based on Integrated Mixed Sampling and Ensemble Feature Selection: RBR_XGB
並列篇名
A Credit Scoring Model Based on Integrated Mixed Sampling and Ensemble Feature Selection: RBR_XGB
作者 Xiaobing LinZhe WuJianfa ChenLianfen HuangZhiyuan Shi
英文摘要

With the rapid development of the economy, financial institutions pay more and more attention to the importance of financial credit risk. The XGBoost algorithm is often used in credit scoring. However, it should be noted that XGBoost has three disadvantages when dealing with small samples of high-dimensional imbalance: (1) the model classification results are more biased towards the majority class when the XGBoost algorithm is used in training imbalanced data, this results in reduced model accuracy. (2) XGBoost algorithm is prone to overfitting in high-dimensional data because the higher the data dimension, the sparser the samples. (3) In small datasets, it is prone to form data fragmentation, resulting in reduced model accuracy. A Credit Scoring Model Based On Integrated Mixed Sampling And Ensemble Feature Selection (RBR_XGB) is proposed on the following issues in this paper. The model first aims at the model failure and overfitting problems of XGBoost in the face of highly imbalanced small samples, and uses the improved hybrid sampling algorithm combining RUS and BSMOTE1 to balance and expand the data set. For feature redundancy problems, the RFECV_XGB algorithm is used to filter features for reducing interference features. Then, considering the strength of the distinguishing ability of different models, the validation set is used to assign weights to different models, and the weighted ensemble is used to further improve the performance of the model. The experimental results show that the classification performance of the RBR_XGB algorithm for high-dimensional imbalanced small data is higher than that of the traditional XGBoost algorithm, and it can be used for commercial use.

 

起訖頁 1061-1068
關鍵詞 Credit scoringXGBoostImbalance dataHigh-dimensional data
刊名 網際網路技術學刊  
期數 202209 (23:5期)
出版單位 台灣學術網路管理委員會
DOI 10.53106/160792642022092305014   複製DOI
QR Code
該期刊
上一篇
A Comparison Experiment of Binary Classification for Detecting the GTP Encapsulated IoT DDoS Traffics in 5G Network
該期刊
下一篇
Effective Radio Resource Allocation for IoT Random Access by Using Reinforcement Learning

高等教育知識庫  閱讀計畫  教育研究月刊  新書優惠  

教師服務
合作出版
期刊徵稿
聯絡高教
高教FB
讀者服務
圖書目錄
教育期刊
訂購服務
活動訊息
數位服務
高等教育知識庫
國際資料庫收錄
投審稿系統
DOI註冊
線上購買
高點網路書店 
元照網路書店
博客來網路書店
教育資源
教育網站
國際教育網站
關於高教
高教簡介
出版授權
合作單位
知識達 知識達 知識達 知識達 知識達 知識達
版權所有‧轉載必究 Copyright2011 高等教育文化事業股份有限公司  All Rights Reserved
服務信箱:edubook@edubook.com.tw 台北市館前路 26 號 6 樓 Tel:+886-2-23885899 Fax:+886-2-23892500