Sample size for classifier development — samsize

Determine the sample size necessary to estimate the probability of correct classification (PCC) to within a certain tolerance of the optimal (Bayes) PCC.

Usage

samsize_pcc(effect, tolerance, p = 0.5, nfeat = 1, dfeat = 1)

Arguments

effect: Effect size (difference in means divided by SD)
tolerance: Sample size is found such that \(PCC(\infty) - PCC(n)\) is less than tolerance.
p: Proportion of less common class (default 0.5)
nfeat: Number of features
dfeat: Number of differential features. Note that Dobbin & Simon recommend using dfeat=1.

Value

Object of class `power.htest“, a list of the arguments augmented with method and note elements.

Details

Assumes a multivariate normal distribution with spherical variance. Loosely based on the function MKmisc::ssize.pcc(), but with two primary differences:

Doesn't solve for worst case scenario over 1:dfeat, just uses dfeat.
Uses tpower() rather than approximating the power of the t-test.

Examples

samsize_pcc(0.5, 0.001)
#> 
#>      Sample Size Planning for Developing Classifiers Using High Dimensional Data 
#> 
#>       tolerance = 0.001
#>               p = 0.5
#>          effect = 0.5
#>           nfeat = 1
#>           dfeat = 1
#>         pcc_inf = 0.5987063
#>           pcc_n = 0.5977528
#>              n1 = 34
#>              n2 = 34
#> 
samsize_pcc(1, 0.1, nfeat=22000)
#> 
#>      Sample Size Planning for Developing Classifiers Using High Dimensional Data 
#> 
#>       tolerance = 0.1
#>               p = 0.5
#>          effect = 1
#>           nfeat = 22000
#>           dfeat = 1
#>         pcc_inf = 0.6914625
#>           pcc_n = 0.5946208
#>              n1 = 38
#>              n2 = 38
#> 
samsize_pcc(0.8, 0.1, p=1/3, nfeat=22000, dfeat=20)
#> 
#>      Sample Size Planning for Developing Classifiers Using High Dimensional Data 
#> 
#>       tolerance = 0.1
#>               p = 0.3333333
#>          effect = 0.8
#>           nfeat = 22000
#>           dfeat = 20
#>         pcc_inf = 0.965748
#>           pcc_n = 0.86905
#>              n1 = 36
#>              n2 = 72
#>