gendata {scR} | R Documentation |
Simulate data with appropriate structure to be used in estimating sample complexity bounds
Description
Simulate data with appropriate structure to be used in estimating sample complexity bounds
Usage
gendata(model, dim, maxn, predictfn = NULL, varnames = NULL, ...)
Arguments
model |
A binary classification model supplied by the user. Must take arguments |
dim |
Gives the horizontal dimension of the data (number of predictor variables) to be generated. |
maxn |
Gives the vertical dimension of the data (number of observations) to be generated. |
predictfn |
An optional user-defined function giving a custom predict method. If also using a user-defined model, the |
varnames |
An optional character vector giving the names of variables to be used for the generated data |
... |
Additional arguments that need to be passed to |
Value
A data.frame
containing the simulated data.
See Also
estimate_accuracy()
, to estimate sample complexity bounds given the generated data
Examples
mylogit <- function(formula, data){
m <- structure(
glm(formula=formula,data=data,family=binomial(link="logit")),
class=c("svrclass","glm") #IMPORTANT - must use the class svrclass to work correctly
)
return(m)
}
mypred <- function(m,newdata){
out <- predict.glm(m,newdata,type="response")
out <- factor(ifelse(out>0.5,1,0),levels=c("0","1"))
#Important - must specify levels to account for possibility of all
#observations being classified into the same class in smaller samples
return(out)
}
formula <- two_year_recid ~
race + sex + age + juv_fel_count +
juv_misd_count + priors_count + charge_degree..misd.fel.
dat <- gendata(mylogit,7,7214,mypred,all.vars(formula))
library(parallel)
results <- estimate_accuracy(formula,mylogit,dat,predictfn = mypred,
nsample=10,
steps=10,
coreoffset = (detectCores() -2))