Accuracy Improvement of Khmer Text Recognition by Correcting Post-recognized Characters

Sovila SRUN (1), Tak KEAN (2), Leap BUN (3)
(1) Faculty of Engineering, Royal University of Phnom Penh (RUPP), Russian Federation Boulevard, Khan Toul Kork, Phnom Penh, Cambodia., Cambodia,
(2) Rector’s Office, Royal University of Phnom Penh (RUPP), Russian Federation Boulevard, Khan Toul Kork, Phnom Penh, Cambodia., Cambodia,
(3) Faculty of Engineering, Royal University of Phnom Penh (RUPP), Russian Federation Boulevard, Khan Toul Kork, Phnom Penh, Cambodia., Cambodia

Abstract

Key Messages



  • The Constitution of Cambodia establishes Khmer as the official language, making accurate digital processing of Khmer documents crucial for national development. Limitations hinder current digital transformation efforts in both public and private sectors in Khmer text recognition technology.

  • Existing Optical Character Recognition (OCR) tools show significant limitations with Khmer script, with common character recognition errors affecting document processing efficiency. Our post-processing correction method improves Khmer OCR accuracy from 93.4 to 96.4%, representing a significant advancement in Khmer text digitization.

  • The proposed solution can be integrated into existing document management systems without requiring extensive infrastructure changes. Government agencies and private organizations can achieve higher efficiency in document digitization while maintaining Khmer language integrity.

  • Government institutions should prioritize the adoption of improved Khmer OCR systems to enhance public service delivery. Investment in Khmer language digital tools will support Cambodia’s digital transformation goals while preserving its linguistic heritage.

Full text article

Generated from XML file

Authors

Sovila SRUN
srun.sovila@rupp.edu.kh (Primary Contact)
Tak KEAN
Leap BUN
SRUN, S., KEAN, T., & BUN, L. Accuracy Improvement of Khmer Text Recognition by Correcting Post-recognized Characters. Insight: Cambodia Journal of Basic and Applied Research, 6(2), -. https://doi.org/10.61945/cjbar.2024.6.2.05
Copyright and license info is not available

Article Details

No Related Submission Found