train {handwriterRF} | R Documentation |
A Training Set of Cluster Fill Rates
Description
Writers from the CSAFE Handwriting Database and the CVL Handwriting Database were randomly assigned to train, validation, and test sets.
Usage
train
Format
A dataframe with 800 rows and 43 variables:
- docname
The file name of the handwriting sample.
- writer
Writer ID. There are 200 distinct writer ID's. Each writer has 4 documents in the dataframe.
- doc
The name of the handwriting prompt.
- total_graphs
The total number of graphs in the document.
- cluster1
The proportion of graphs in cluster 1
- cluster2
The proportion of graphs in cluster 2
- cluster3
The proportion of graphs in cluster 3
- cluster4
The proportion of graphs in cluster 4
- cluster5
The proportion of graphs in cluster 5
- cluster6
The proportion of graphs in cluster 6
- cluster7
The proportion of graphs in cluster 7
- cluster8
The proportion of graphs in cluster 8
- cluster9
The proportion of graphs in cluster 9
- cluster10
The proportion of graphs in cluster 10
- cluster11
The proportion of graphs in cluster 11
- cluster12
The proportion of graphs in cluster 12
- cluster13
The proportion of graphs in cluster 13
- cluster14
The proportion of graphs in cluster 14
- cluster15
The proportion of graphs in cluster 15
- cluster16
The proportion of graphs in cluster 16
- cluster17
The proportion of graphs in cluster 17
- cluster18
The proportion of graphs in cluster 18
- cluster19
The proportion of graphs in cluster 19
- cluster20
The proportion of graphs in cluster 20
- cluster21
The proportion of graphs in cluster 21
- cluster22
The proportion of graphs in cluster 22
- cluster23
The proportion of graphs in cluster 23
- cluster24
The proportion of graphs in cluster 24
- cluster25
The proportion of graphs in cluster 25
- cluster26
The proportion of graphs in cluster 26
- cluster27
The proportion of graphs in cluster 27
- cluster28
The proportion of graphs in cluster 28
- cluster29
The proportion of graphs in cluster 29
- cluster30
The proportion of graphs in cluster 30
- cluster31
The proportion of graphs in cluster 31
- cluster32
The proportion of graphs in cluster 32
- cluster33
The proportion of graphs in cluster 33
- cluster34
The proportion of graphs in cluster 34
- cluster35
The proportion of graphs in cluster 35
- cluster36
The proportion of graphs in cluster 36
- cluster37
The proportion of graphs in cluster 37
- cluster38
The proportion of graphs in cluster 38
- cluster39
The proportion of graphs in cluster 39
- cluster40
The proportion of graphs in cluster 40
Details
The train dataframe contains cluster fill rates for 800 handwritten documents from the CSAFE Handwriting Database and the CVL Handwriting Database. The documents are from 200 writers. The CSAFE Handwriting Database has nine repetitions of each prompt. Two London Letter prompts and two Wizard of Oz prompts were randomly selected from each writer. The CVL Handwriting Database does not contain multiple repetitions of prompts and four English language prompts were randomly selected from each writer.
The documents were split into graphs with
process_batch_dir
. The graphs were grouped into
clusters with get_clusters_batch
. The cluster fill
counts were calculated with
get_cluster_fill_counts
. Finally,
get_cluster_fill_rates
calculated the cluster fill rates.
Source
https://forensicstats.org/handwritingdatabase/, https://cvl.tuwien.ac.at/research/cvl-databases/an-off-line-database-for-writer-retrieval-writer-identification-and-word-spotting/