module UTF8

unicode.org/mail-arch/unicode-ml/y2003-m02/att-0467/01-The_Algorithm_to_Valide_an_UTF-8_String

* state START

      * Input = 0x00-0x7F : change state to START
      * Input = 0xC2-0xDF: change state to A
      * Input = 0xE1-0xEC, 0xEE-0xEF: change state to B
      * Input = 0xE0: change state to C
      * Input = 0xED: change state to D
      * Input = 0xF1-0xF3:change state to E
      * Input = 0xF0: change state to F
      * Input = 0xF4: change state to G
      * Input = Others (0x80-0xBF,0xC0-0xC1, 0xF5-0xFF): ERROR

* state A
      o Input = 0x80-0xBF: change state to START
      o Others: ERROR
* state B
      o Input = 0x80-0xBF: change state to A
      o Others: ERROR
* state C
      o Input = 0xA0-0xBF: change state to A
      o Others: ERROR
* state D
      o Input = 0x80-0x9F: change state to A
      o Others: ERROR
* state E
      o Input = 0x80-0xBF: change state to B
      o Others: ERROR
* state F
      o Input = 0x90-0xBF: change state to B
      o Others: ERROR
* state G
      o Input = 0x80-0x8F: change state to B
      o Others: ERROR

This state machine can be easily understood by:

a) examining the machine behavior as documented b) reference to an excellent UTF-8 article with accompanying table here:

en.wikipedia.org/wiki/UTF-8

# # == Purpose # # Container for UTF-8 validator. #