Sample size for classifier development
samsize_pcc.Rd
Determine the sample size necessary to estimate the probability of correct classification (PCC) to within a certain tolerance of the optimal (Bayes) PCC.
Arguments
- effect
Effect size (difference in means divided by SD)
- tolerance
Sample size is found such that \(PCC(\infty) - PCC(n)\) is less than
tolerance
.- p
Proportion of less common class (default 0.5)
- nfeat
Number of features
- dfeat
Number of differential features. Note that Dobbin & Simon recommend using dfeat=1.
Value
Object of class `power.htest“, a list of the arguments augmented with method and note elements.
Details
Assumes a multivariate normal distribution with spherical variance.
Loosely based on the function MKmisc::ssize.pcc()
, but with two primary
differences:
Doesn't solve for worst case scenario over 1:dfeat, just uses dfeat.
Uses
tpower()
rather than approximating the power of the t-test.
Examples
samsize_pcc(0.5, 0.001)
#>
#> Sample Size Planning for Developing Classifiers Using High Dimensional Data
#>
#> tolerance = 0.001
#> p = 0.5
#> effect = 0.5
#> nfeat = 1
#> dfeat = 1
#> pcc_inf = 0.5987063
#> pcc_n = 0.5977528
#> n1 = 34
#> n2 = 34
#>
samsize_pcc(1, 0.1, nfeat=22000)
#>
#> Sample Size Planning for Developing Classifiers Using High Dimensional Data
#>
#> tolerance = 0.1
#> p = 0.5
#> effect = 1
#> nfeat = 22000
#> dfeat = 1
#> pcc_inf = 0.6914625
#> pcc_n = 0.5946208
#> n1 = 38
#> n2 = 38
#>
samsize_pcc(0.8, 0.1, p=1/3, nfeat=22000, dfeat=20)
#>
#> Sample Size Planning for Developing Classifiers Using High Dimensional Data
#>
#> tolerance = 0.1
#> p = 0.3333333
#> effect = 0.8
#> nfeat = 22000
#> dfeat = 20
#> pcc_inf = 0.965748
#> pcc_n = 0.86905
#> n1 = 36
#> n2 = 72
#>