Scoring Reading Parameters: An Inter-Rater Reliability Study Using The MNREAD Chart

authors

  • Baskaran Karthikeyan
  • Macedo Antonio Filipe
  • He Yingchen
  • Hernandez-Moreno Laura
  • Queirós Tatiana
  • Mansfield J Stephen
  • Calabrèse Aurélie

keywords

  • Low vision
  • Reading performance
  • Reading test
  • MNREAD acuity chart
  • Inter-rater reliability
  • Computer-based scoring algorithms

document type

UNDEFINED

abstract

Purpose: First, to evaluate inter-rater reliability when human raters estimate the reading performance of visually impaired individuals using the MNREAD acuity chart. Second, to evaluate the agreement between computer-based scoring algorithms and compare them with human rating. Methods: Reading performance was measured for 101 individuals with low vision, using the Portuguese version of MNREAD. Seven raters estimated the maximum reading speed (MRS) and critical print size (CPS) of each individual MNREAD curve. MRS and CPS were also calculated automatically for each MNREAD curve using two different algorithms: the original standard deviation method (SDev) and a non-linear mixed effects (NLME) modeling. Intra-class correlation coefficients (ICC) were used to estimate absolute agreement between raters and/or algorithms. Results: Absolute agreement between raters was excellent for MRS (ICC = 0.97; 95%CI [0.96, 0.98]) and good for CPS (ICC = 0.77; 95%CI [0.69, 0.83]). For CPS inter-rater reliability was poorer among less experienced raters (ICC = 0.70; 95%CI [0.57, 0.80]) compared to experienced ones (ICC = 0.82; 95%CI [0.57, 0.80]). Absolute agreement between the two algorithms was excellent for MRS (ICC = 0.96; 95%CI [0.91, 0.98]). For CPS, the best possible agreement was good and for CPS defined as the print size sustaining 80% of MRS (ICC = 0.77; 95%CI [0.68, 0.84]). Conclusion: For MRS, inter-rater reliability is excellent, even considering the possibility of noisy and/or incomplete data collected in low-vision individuals. For CPS, inter-rater reliability is lower, which may be problematic, for instance in the context of multicenter studies or follow-up examinations. Setting up consensual guidelines to deal with ambiguous datasets may help improve reliability. While the exact definition of CPS should be chosen on a case-by-case basis depending on the clinician or researcher's motivations, evidence suggests that estimating CPS as the smallest print size sustaining about 80% of MRS would increase inter-rater reliability.

more information