End-to-end Speaker Recognition Based on MTFC-FullRes2Net

Li-Hong Deng; Fei Deng; Ge-Xiang Chiou; Qiang Yang

熱門：朱丽彬黃光男王美玲王善边曾瓊瑤崔雪娟

首頁

臺灣期刊 學校系所學協會民間出版

大陸/海外期刊 政府機關學校系所學協會民間出版

DOI註冊服務


篇名	End-to-end Speaker Recognition Based on MTFC-FullRes2Net
並列篇名	End-to-end Speaker Recognition Based on MTFC-FullRes2Net
作者	Li-Hong Deng、Fei Deng、Ge-Xiang Chiou、Qiang Yang
英文摘要	The feature extraction ability of lightweight convolutional neural networks in speaker recognition systems is weak. And recognition accuracy is poor. Many methods use deeper, wider, and more complex network structures to improve the feature extraction ability. But it makes the parameters and inference time increase exponentially. In the paper, we introduce Res2Net in target detection task to speaker recognition task and verify its effectiveness and robustness in the speaker recognition task. And we improved and proposed FullRes2Net. It has better multi-scale feature extraction ability without increasing the number of parameters. Then, we proposed the mixed time-frequency channel attention to solve the problems of existing attention methods to improve the shortcomings of convolution itself and further enhance the feature extraction ability of convolutional neural networks. Experiments were conducted on the Voxceleb dataset. The results show that the MTFC-FullRes2Net end-to-end speaker recognition system proposed in this paper effectively improves the feature extraction and generalization ability of the Res2Net. Compared to Res2Net, MTFC-FullRes2Net performance improves by 31.5%. And Compared to ThinResNet-50, RawNet, CNN+Transformer and Y-vector, MTFC-FullRes2Net performance is improved by 56.5%, 14.1%, 16.7% and 23.4%, respectively. And it is superior to state-of-the-art speaker recognition systems that use complex structures. It is a lightweight and more efficient end-to-end architecture and is also more suitable for practical application.
起訖頁	075-091
關鍵詞	speaker recognition、res2net、attention mechanisms
刊名	電腦學刊
期數	202306 (34:3期)
DOI	10.53106/199115992023063403006 複製DOI
QR Code
該期刊上一篇	Traffic Sign Detection Based on Improved YOLOv5
該期刊下一篇	A Dynamic Task Assignment Optimization Method for Multi-AGV System Based on Genetic Algorithm

教師服務合作出版期刊徵稿聯絡高教高教FB	讀者服務圖書目錄教育期刊訂購服務活動訊息	數位服務高等教育知識庫國際資料庫收錄投審稿系統 DOI註冊	線上購買高點網路書店元照網路書店博客來網路書店	教育資源教育網站國際教育網站	關於高教高教簡介出版授權合作單位
知識達	知識達	知識達	知識達	知識達	知識達