對話者之語言能力與評分嚴苛度對印尼語口語評量成績之影響,ERICDATA高等教育知識庫
高等教育出版
熱門: 黃光男  王善边  朱丽彬  崔雪娟  王美玲  黃乃熒  
高等教育出版
首頁 臺灣期刊   學校系所   學協會   民間出版   大陸/海外期刊   政府機關   學校系所   學協會   民間出版   DOI註冊服務
篇名
對話者之語言能力與評分嚴苛度對印尼語口語評量成績之影響
並列篇名
Influence of Interlocutor Proficiency and Rater Severity in Indonesian Language Oral Assessment
作者 何德華張惠環許婉儀
中文摘要
外語課堂以溝通式教學為目標者,常見的口語評量模式是以二人一組搭檔對話的方式進行口試,並由評分者使用評分表檢定成效。然而學生在選擇口試搭檔時,可能因選擇不同對象而影響口試表現;而不同評分者在使用評分表時,也可能因個人評分嚴苛度有所差異,給予不同口試成績,因此教學者需要考慮是否需要規定口試對話搭檔之選擇標準,以及如何訓練助教團隊使用評分表以增進口試之公平客觀性。本研究以臺灣一所國立大學通識教育中心之印尼語課程為研究場域,使用Rasch模型檢測:(1)評分者不同的嚴苛度在經過訓練之後能否達成口試評分的一致性?(2)學生在口試搭檔的選擇上,選擇與個人語言背景相當(初學者與初學者搭檔)或與個人語言背景不相同者(初學者與印尼華人搭檔)是否會影響其口試成績?本研究結果發現不同評分者即使施予訓練仍無法完全達成評分一致性,因此目前由多位評分者共同擔綱,刪除離群值、取其平均數,或許是權宜之計。然而,根據多層面Rasch分析法檢測評分者嚴苛度,有助及早發現問題。其次,學生選擇與不同語言能力背景搭檔口試並不會影響其口試成績,因此應讓學生自由選擇對話搭檔,輔以鼓勵機制讓印尼華僑多跟初學者搭配,以達到雙贏的效果。
英文摘要
The use of pair work in speaking assessment has frequently been adopted as an authentic manner of testing oral proficiency in second-language communicative language classrooms; however, the findings of studies regarding whether interlocutor proficiency influences the outcomes of oral assessment and whether rater training enables long-term interrater reliability have been inconclusive or contradictory. Studies have indicated that if one of a pair of interlocutors exhibits higher proficiency than the other or if the individuals know each other well, they may collaborate to produce more speech and achieve higher performance in oral assessments (Iwashita, 1996; Norton, 2005; Storch, 2001). However, a higher volume of speech is not always associated with higher overall performance scores (Davis, 2009). Other studies (Galaczi, 2008, 2014) have found that weaker language users might be more reluctant to contribute in oral interactions when paired with more proficient interlocutors. Son (2016) reported that Korean students of English as a foreign language spoke less when paired with more proficient interlocutors, although their overall oral performance did not necessarily decrease. The outcomes of oral assessments may also be influenced by the reliability of the ratings of assessors. Rater severity can be identified by applying the many-facet Rasch model (MFRM; Eckes, 2009, 2015). Although rater training can theoretically increase the confidence and consistency of raters (Davis, 2012, 2016; Huang et al., 2016; McNamara, 1996), differences in rater severity often persist after training (Eckes, 2005, 2009, 2015; Knoch, 2011; Sundqvist et al., 2020; Weigle, 1998) but the results of training are not necessarily long-lasting (Bonk & Ockey, 2003; Chang et al., 2011; Kim, 2011; Lan, 2012; Liao, 2016; Lumley & McNamara, 1995). Because second language assessment generally involves more than one assessor, providing on-the-job rater training is necessary to increase interrater reliability in oral assessments.
Therefore, the following must be explored: (1) Whether training raters in the use of assessment rubrics increases interrater reliability, and (2) whether test takers perform differently when paired with interlocutors of different proficiency levels. This study investigated oral assessment in two General Education Indonesian language classes at a national university in Taiwan that was conducted in the fall semesters of 2020 and 2021. The study used Rasch analysis to measure to what extent interlocutor proficiency (Indonesian language learning beginners vs. speakers of Indonesian as a first language) influenced the students’oral performance and to what extent the severity of the Indonesian teaching assistants (TAs) could be identified and controlled for. The 2020 class comprised 44 students (Taiwanese individuals = 26, Chinese Indonesian individuals = 10, individuals of other nationalities = 8; men = 10, women = 34) and 7 Indonesian TAs (TAs from North Sumatra = 4, TAs from Java = 2, TA from Sulawesi = 1; men = 2, women = 5), and the 2021 class comprised 38 students (Taiwanese individuals = 17, Chinese Indonesian individuals = 14, Chinese Malaysian individuals = 4, individuals of other nationalities = 3; men = 18, women = 20) and 8 Indonesian TAs (TAs from North Sumatra = 4, TAs from Java = 4; men = 4, women = 4). The data comprised six oral assessments performed throughout the semester for each class that were scored by the trained TAs according to a rubric containing five categories: Content, accuracy, fluency, pronunciation, and interaction. The participants self-assessed their Indonesian language proficiency at the beginning of the semester. Generally, the Chinese Indonesian and Chinese Malaysian students rated themselves as native speakers of Indonesian and Malay, respectively, whereas the Taiwanese students and those of other nationalities identified themselves as true beginners. The participants selected their partners for the oral exams from among their classmates. The data were analyzed using Facets (Linacre, 2022a) to investigate the oral performance of each student pair, the severity of their assessor, and the difficulty of the criteria in the scoring rubric. The scores were transformed into a logit scale for comparison. Analysis based on the MFRM was used to obtain the following information for interpretation: logit measurements, the information-weighted mean-square fit statistic (infit), the outlier sensitive mean-square fit statistic (outfit), the separation index, reliability of separation index, and Chi-square tests for homogeneity. The results were represented using a variable map for each semester, divided into sections for each of the aforementioned three facets. A higher logit value in the three facets indicated higher student pair performance in oral exams, more severe rating, and more difficult criteria for high scores.
The results indicate that even after training, rater consistency was low. In the 2020 class, Chinese Indonesian students had the highest scores, as expected. Performance ranged widely among the Taiwanese students and those of other nationalities. Among the seven TAs, five provided similar ratings and two provided ratings that were either excessively high (logit = -2.42) or excessively low (logit = 1.03) for the midterm oral assessment. After further training was provided before the final exam, two different TAs provided markings that were either excessively high (-0.45 logits) or excessively low (0.97 logits); however, the rater severity among the seven TAs for the final exam was within 1 and -1 logits, the acceptable range. The rater variable interacted with the rating criteria. One TA rated accuracy favorably (t = 2.76) but rated interaction (t = -2.11) severely. Another rated fluency favorably (t = 2.55) but rated pronunciation severely (t = -4.25). In the 2021 class, although the eight TAs were fully trained to use the rubric consistently, variables beyond our control that influenced rating consistency, especially the interaction between the rater and criteria, remained. Therefore, using average scores after outliers are removed may be a viable alternative method of grading until a superior solution is identified. Nonetheless, identifying rater severity variability was helpful as a basis for further rater training.
Different Indonesian proficiency levels between assessment partners did not influence individual student scores in the oral assessments. The students from the 2020 and 2021 classes were categorized into four groups, LL, LH, HL, and HH (L = true beginner, H = proficient Indonesian/Malaysian speaker). Their mean scores were analyzed using Kruskal–Wallis tests. We first investigated whether beginners paired with proficient speakers (LH) scored higher than did those paired with other beginners (LL). However, the scores of these groups did not differ significantly. Next, we determined whether proficient speakers paired with beginners (HL) would score lower than did those paired with other proficient speakers (HH). The scores of these groups did not differ significantly. Our results support the findings of Davis (2009) and Son (2016). We did not demonstrate that interlocutor proficiency positively or negatively affected the students’oral performance. However, based on the comprehensive analysis of students’feedback on the oral examination method, the students seemed to prefer to select partners and remain in their partnerships throughout the semester. Because they were allowed to prepare their scripts and practice their oral exams before the exams, the students developed a sense of solidarity and camaraderie with their partners. The amount of speech they used appeared to not be influenced by differences in interlocutor proficiency. The students were also tolerant of mistakes made by their partners and exhibited patience. Thus, allowing students to choose their own partners and encouraging local students to pair with Chinese Indonesian students would increase their intercultural experiences.
The research site had two unique features that may not be present in other second language classrooms. One was team instruction conducted by a linguist and 7–8 TAs. The other was the presence of a considerable number of proficient speakers of Indonesian/Malay as students attending class with true beginners. Nonetheless, these unique features provide valuable information in this case study with multiyear data.
起訖頁 25-46
關鍵詞 多層面Rasch模式口語評量印尼語評分嚴苛度對話搭檔many-facet Rasch modeloral assessmentIndonesianrater severityinterlocutor proficiency
刊名 教育心理學報  
期數 202309 (55:1期)
出版單位 國立臺灣師範大學教育心理與輔導學系
該期刊
上一篇
家庭環境影響幼兒語言發展因果關係之研究
該期刊
下一篇
家庭環境與幼兒社會能力之關係──氣質的調節效果

高等教育知識庫  閱讀計畫  教育研究月刊  新書優惠  

教師服務
合作出版
期刊徵稿
聯絡高教
高教FB
讀者服務
圖書目錄
教育期刊
訂購服務
活動訊息
數位服務
高等教育知識庫
國際資料庫收錄
投審稿系統
DOI註冊
線上購買
高點網路書店 
元照網路書店
博客來網路書店
教育資源
教育網站
國際教育網站
關於高教
高教簡介
出版授權
合作單位
知識達 知識達 知識達 知識達 知識達 知識達
版權所有‧轉載必究 Copyright2011 高等教育文化事業股份有限公司  All Rights Reserved
服務信箱:edubook@edubook.com.tw 台北市館前路 26 號 6 樓 Tel:+886-2-23885899 Fax:+886-2-23892500