Refining the Unrestricted Character Encoding for Japanese

9 pages•Published: March 13, 2019

Abstract

We have proposed in a previous work an unrestricted character encoding for Japanese (UCEJ). This encoding features an advanced structure, relying on three dimensions, in order to enhance the code usability, easier character lookup being one application. This is in comparison of, for instance, Unicode. In this paper, we propose several important refinements to the UCEJ encoding: first, the addition of the Latin and kana character sets as ubiquitous in Japanese, and second, the inclusion of character stroke order and stroke types into the code and the corresponding binary representation. We estimate the average and worst-case memory complexity of the proposed encoding, and conduct an experiment to measure the required memory size in practice, each time comparing the proposal to conventional encodings.

Keyphrases: chinese, code, glyph, information representation, kanji, logogram

In: Gordon Lee and Ying Jin (editors). Proceedings of 34th International Conference on Computers and Their Applications, vol 58, pages 292-300.

Links:	https://easychair.org/publications/paper/Rxzs
	https://doi.org/10.29007/wskt

BibTeX entry

@inproceedings{CATA2019:Refining_Unrestricted_Character_Encoding,
  author    = {Antoine Bossard and Keiichi Kaneko},
  title     = {Refining the Unrestricted Character Encoding for Japanese},
  booktitle = {Proceedings of 34th International Conference on Computers and Their Applications},
  editor    = {Gordon Lee and Ying Jin},
  series    = {EPiC Series in Computing},
  volume    = {58},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/Rxzs},
  doi       = {10.29007/wskt},
  pages     = {292-300},
  year      = {2019}}

Download PDF Open PDF in browser