Evaluation of artificial language-adjusted readability on cataract surgery queries

Michael Lin; Azam Husain; Libby Wei; Isa Mohammed; Sara Francomacaro; Wuqaas M. Munir

doi:10.5693.djo.01.2026.03.001

Before prompting all AI models to re-answer at a 6th-grade reading level, more answers were judged to be readable at a 6th-grade level by all three graders than the number of answers considered to be readable at a 6th-grade reading level as determined by readability grade level formulas.

PDFprepub

Published: Jun 26, 2026

DOI: https://doi.org/10.5693.djo.01.2026.03.001

Keywords:

artificial intelligence, cataract surgery, patient education

Michael X. Lin, BA

University of Maryland Department of Ophthalmology and Visual Sciences, Baltimore, Maryland

Azam S. Husain, MD

University of Maryland Department of Ophthalmology and Visual Sciences, Baltimore, Maryland

Libby Wei, MD

University of Maryland Department of Ophthalmology and Visual Sciences, Baltimore, Maryland

Isa S. K. Mohammed, MD

Wilmer Eye Institute, Baltimore, Maryland

Sara Francomacaro, MD

Rutzen Eye Specialists & Laser Center, Severna Park, Maryland

Wuqaas M. Munir, MD

University of Maryland Department of Ophthalmology and Visual Sciences, Baltimore, Maryland

Abstract

Purpose
To analyze accuracy and readability of answers to cataract surgery queries produced by artificial intelligence (AI) models and determine whether AI models can significantly improve readability.
Methods
Google Gemini Advanced, ChatGPT 4.0, and Microsoft Copilot Pro were prompted to answer 25 questions about cataract surgery, followed by a request to re-answer questions at a 6th-grade level. Objective readability of answers were measured with five validated reading formulas and word count. Accuracy and readability of each answer were further graded by three ophthalmologists. Comparisons were performed between original and 6th-grade versions and among the three AI models.
Results
After being prompted to answer at a 6th-grade reading level, Google Gemini Advanced and Microsoft Copilot Pro had lower average reading level than ChatGPT 4.0 (8.04 vs 8.19 vs 9.43 [P < 0.001]). Microsoft Copilot answers had higher Flesch reading ease score (75.40 vs 71.24 vs 69.46 [P < 0.007]) and lower word count (130.28 vs 180.24 vs 166.08 [P < 0.001]) among AI models. Microsoft Copilot Pro and ChatGPT 4.0 answers had greater change in reading level (−6.13 vs −5.75 vs −3.31 [P < 0.001]) and Flesch reading ease score (39.67 vs 35.98 vs 23.67 [P < 0.001]) compared with Google Gemini Advanced. Graders determined that there were no changes in accuracy before and after being prompted to answer at a 6th-grade reading level.
Conclusions
AI models can simplify reading level of responses to common cataract surgery queries while maintaining accuracy.

Downloads

Download data is not yet available.

How to Cite

1.

Lin M, Husain A, Wei L, Mohammed I, Francomacaro S, Munir WM. Evaluation of artificial language-adjusted readability on cataract surgery queries. Digit J Ophthalmol. 2026;32(2). doi:10.5693.djo.01.2026.03.001

Issue

Vol. 32 No. 2 (2026)

Section

Original Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Farmanova E, Bonneville L, Bouchard L. Organizational health literacy: review of theories, frameworks, guides, and implementation issues. Inq J Health Care Organ Provis Financ 2018;55:004695801875784.

Shahid R, Shoker M, Chu LM, et al. Impact of low health literacy on patients’ health outcomes: a multicenter cohort study. BMC Health Serv Res 2022;22:1148.

Parker R. Health literacy: a challenge for American patients and their health care providers. Health Promot Int 2000;15:277-83.

Baker DW, Parker RM, Williams MV, Clark WS. Health literacy and the risk of hospital admission. J Gen Intern Med 1998;13:791-8.

Rasu RS, Bawa WA, Suminski R, et al. Health literacy impact on national healthcare utilization and expenditure. Int J Health Policy Manag 2015;4:747-55.

Prince LY, Schmidtke C, Beck JK, Hadden KB. An assessment of organizational health literacy practices at an academic health center. Qual Manag Health Care 2018;27:93-7.

MacLeod S, Musich S, Gulyas S, et al. The impact of inadequate health literacy on patient satisfaction, healthcare utilization, and expenditures among older adults. Geriatr Nurs N Y N 2017;38:334-41.

DuBay WH. The principles of readability. August 25, 2004. Available at https://eric.ed.gov/?id=ed490073.

Eltorai AEM, Ghanian S, Adams CA, et al. Readability of patient education materials on the American Association for Surgery of Trauma website. Arch Trauma Res 2014;3:e18161.

Thomas ND, Mahler R, Rohde M, Segovia N, Shea KG. Evaluating the readability and quality of online patient education materials for pediatric ACL tears. J Pediatr Orthop 2023;43:549-54.

Lin MX, Li G, Cui D, Mathews PM, Akpek EK. Usability of patient education-oriented cataract surgery websites. Ophthalmology 2024;131:499-506.

Williams AM, Muir KW, Rosdahl JA. Readability of patient education materials in ophthalmology: a single-institution study and systematic review. BMC Ophthalmol 2016;16:133.

Choudhery S, Xi Y, Chen H, et al. Readability and quality of online patient education material on websites of breast imaging centers. J Am Coll Radiol 2020;17:1245-51.

Cohen SA, Tijerina JD, Kossler A. The readability and accountability of online patient education materials related to common oculoplastics diagnoses and treatments. Semin Ophthalmol 2023;38:387-93.

Clark M, Bailey S. Chatbots in health care: connecting patients to information. Emerging Health Technologies. CADTH Horizon Scans. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2024. Available at http://www.ncbi.nlm.nih.gov/books/NBK602381/.

Presiado M, Montero A, Lopes L, Hamel L. KFF Health Misinformation Tracking Poll: artificial intelligence and health information. KFF. 2024. Available at https://www.kff.org/health-information-trust/poll-finding/kff-health-misinformation-tracking-poll-artificial-intelligence-and-health-information/.

Reynolds K, Tejasvi T. Potential use of ChatGPT in responding to patient questions and creating patient resources. JMIR Dermatol 2024;7:e48451.

Monje S, Ulene S, Gimovsky AC. Identifying ChatGPT-written patient education materials using text analysis and readability. Am J Perinatol 2024;41:2229-31.

Jagiella-Lodise O, Suh N, Zelenski NA. Can patients rely on ChatGPT to answer hand pathology-related medical questions? Hand N Y N 2024;15589447241247246.

Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 2023;183:589-96.

Kirchner GJ, Kim RY, Weddle JB, Bible JE. Can artificial intelligence improve the readability of patient education materials? Clin Orthop Relat Res 2023;481:2260-67.

Rouhi AD, Ghanem YK, Yolchieva L, et al. Can artificial intelligence improve the readability of patient education materials on aortic stenosis? a pilot study. Cardiol Ther 2024;13:137-47.

Rossi T, Romano MR, Iannetta D, et al. Cataract surgery practice patterns worldwide: a survey. BMJ Open Ophthalmol 2021;6:e000464.

Wang W, Yan W, Fotis K, et al. Cataract surgical rate and socioeconomics: a global study. Invest Ophthalmol Vis Sci 2016;57:5872-81.

Wei L, Mohammed ISK, Francomacaro S, Munir WM. Evaluating text-based generative artificial intelligence models for patient information regarding cataract surgery. J Cataract Refract Surg 2024;50:95-6.

Iskander M, Hu G, Coulon S, Seixas AA, McGowan R, Al-Aswad LA. Health literacy and ophthalmology: a scoping review. Surv Ophthalmol 2023;68:78-103.

Fleckenstein J, Meyer J, Jansen T, et al. Is a long essay always a good essay? the effect of text length on writing assessment. Front Psychol 2020;11:562462.

Dixon D, Taub MB, Hoenes R, Maples WC. A comparision of short and long reading passages in symptomatic vs. asymptomatic subjects. Optom St Louis Mo. 2012;83:101-6.

Bhattad PB, Pacifico L. Empowering patients: promoting patient education and health literacy. Cureus 2022;14:e27336.

Jedrzejczak WW, Kochanek K. Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard. Audiol Neurootol. 2024 May 6;1–7.

Tepe M, Emekli E. Assessing the responses of large language models (ChatGPT-4, Gemini, and Microsoft Copilot) to frequently asked questions in breast imaging: a study on readability and accuracy. Cureus 2024;16:e59960.

Hoar N, Hoar ME. Readability formulas: are they enough? Contemp Pharm Pract 1981;4:145-9.

Wang LW, Miller MJ, Schmitt MR, Wen FK. Assessing readability formula differences with written health information materials: application, results, and recommendations. Res Soc Adm Pharm RSAP 2013;9:503-16.

Jindal P, MacDermid JC. Assessing reading levels of health information: uses and limitations of flesch formula. Educ Health Abingdon Engl 2017;30:84-8.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References