A UTF-8 Validator State Machine

Provides an implementation of a state machine for validating UTF-8 encoded strings. Clients may request that encoding errors be reported in several ways:

What This gem does Not Provide

That functionality is left as an exercise for the reader.

Thanks To

The Unicode Consortium

At unicode.org/ for all the information published there.

Frank Yung-Fong Tang

For the state machine algorithm. See: unicode.org/mail-arch/unicode-ml/y2003-m02/att-0467/01-The_Algorithm_to_Valide_an_UTF-8_String

Markus Kuhn

For invalid test data. www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

Useful Information

Other interesting and/or useful information can be found:

A Word On Ruby Versions

It is expected that this validator will be used in Ruby environments prior to 1.9.x. However, nothing prohibits use with Ruby 1.9 or 2.0. Tests recognize these environments and adjust behavior accordingly.

Reporting Issues

Please report issues on the tracker at github:

Web Based Documentation

Human readable documentation can be found at:

Contributing to the utf8_validator gem

Copyright © 2011-2014 Guy Allard. See LICENSE.txt for further details.