class Moab::FileSignature

The fixity properties of a file, used to determine file content equivalence regardless of filename. Placing this data in a class by itself facilitates using file size together with the MD5 and SHA1 checksums as a single key when doing comparisons against other file instances. The Moab design assumes that this file signature is sufficiently unique to act as a comparator for determining file equality and eliminating file redundancy.

The use of signatures for a compare-by-hash mechanism introduces a miniscule (but non-zero) risk that two non-identical files will have the same checksum. While this risk is only about 1 in 1048 when using the SHA1 checksum alone, it can be reduced even further (to about 1 in 1086) if we use the MD5 and SHA1 checksums together. And we gain a bit more comfort by including a comparison of file sizes.

Finally, the “collision” risk is reduced by isolation of each digital object’s file pool within an object folder, instead of in a common storage area shared by the whole repository.

Data Model

@see searchstorage.techtarget.com/feature/The-skinny-on-data-deduplication @see www.ibm.com/developerworks/wikis/download/attachments/106987789/TSMDataDeduplication.pdf @see www.redlegg.com/pdf_file/3_1320410927_HowDataDedupeWorks_WP_100809.pdf @see www.library.yale.edu/iac/DPC/AN_DPC_FixityChecksFinal11.pdf

@note Copyright © 2012 by The Board of Trustees of the Leland Stanford Junior University.

All rights reserved.  See {file:LICENSE.rdoc} for details.