module UTF8
unicode.org/mail-arch/unicode-ml/y2003-m02/att-0467/01-The_Algorithm_to_Valide_an_UTF-8_String
* state START * Input = 0x00-0x7F : change state to START * Input = 0xC2-0xDF: change state to A * Input = 0xE1-0xEC, 0xEE-0xEF: change state to B * Input = 0xE0: change state to C * Input = 0xED: change state to D * Input = 0xF1-0xF3:change state to E * Input = 0xF0: change state to F * Input = 0xF4: change state to G * Input = Others (0x80-0xBF,0xC0-0xC1, 0xF5-0xFF): ERROR * state A o Input = 0x80-0xBF: change state to START o Others: ERROR * state B o Input = 0x80-0xBF: change state to A o Others: ERROR * state C o Input = 0xA0-0xBF: change state to A o Others: ERROR * state D o Input = 0x80-0x9F: change state to A o Others: ERROR * state E o Input = 0x80-0xBF: change state to B o Others: ERROR * state F o Input = 0x90-0xBF: change state to B o Others: ERROR * state G o Input = 0x80-0x8F: change state to B o Others: ERROR
This state machine can be easily understood by:
a) examining the machine behavior as documented b) reference to an excellent UTF-8 article with accompanying table here:
# # == Purpose # # Container for UTF-8 validator. #