A function to read in large data files as a filebacked big.matrix
Source: R/process_delim.R
process_delim.RdA function to read in large data files as a filebacked big.matrix
Usage
process_delim(
data_dir,
data_file,
feature_id,
rds_dir = data_dir,
rds_prefix,
logfile = NULL,
overwrite = FALSE,
quiet = FALSE,
...
)Arguments
- data_dir
The directory to the file.
- data_file
The file to be read in, without the filepath. This should be a file of numeric values. Example: use
data_file = "myfile.txt", notdata_file = "~/mydirectory/myfile.txt"Note: if your file has headers/column names, setheader = TRUE– this will be passed intobigmemory::read.big.matrix().- feature_id
A string specifying the column in the data X (the feature data) with the row IDs (e.g., identifiers for each row/sample/participant/, etc.). No duplicates allowed.
- rds_dir
The directory where the user wants to create the
.rdsand.bkfiles. Defaults todata_dir- rds_prefix
String specifying the user's preferred filename for the to-be-created
.rdsfile (will be create insiderds_dirfolder). Note:rds_prefixcannot be the same asdata_prefix- logfile
Optional: the name (character string) of the prefix of the logfile to be written in
rds_dir. Default to NULL (no log file written). Note: do not append a.logto the filename; this is done automatically.- overwrite
Logical: if existing .bk/.rds files exist for the specified directory/prefix, should these be overwritten? Defaults to FALSE. Set to TRUE if you want to change the imputation method you're using, etc.
- quiet
Logical: should console messages be silenced? Defaults to FALSE
- ...
Optional: other arguments to be passed to
bigmemory::read.big.matrix(). Note:sepis an option to pass here, as isheader.
Examples
temp_dir <- tempdir()
colon_dat <- process_delim(data_file = "colon2.txt",
data_dir = find_example_data(parent = TRUE), overwrite = TRUE,
rds_dir = temp_dir, rds_prefix = "processed_colon2", sep = "\t", header = TRUE)
#> Preprocessing colon2 data...
#> Overwriting existing files: processed_colon2.bk/.rds/.desc
#> There are 62 observations and 2001 features in the specified data files.
#> At this time, plmmr::process_delim() does not not handle missing values in delimited data.
#> Please make sure you have addressed missingness before you proceed.
#> process_plink() completed.
#> Processed files now saved as /tmp/RtmpQF5zaF/processed_colon2.rds
colon2 <- readRDS(colon_dat)
str(colon2)
#> List of 3
#> $ X:Formal class 'big.matrix.descriptor' [package "bigmemory"] with 1 slot
#> .. ..@ description:List of 13
#> .. .. ..$ sharedType: chr "FileBacked"
#> .. .. ..$ filename : chr "processed_colon2.bk"
#> .. .. ..$ dirname : chr "/tmp/RtmpQF5zaF/"
#> .. .. ..$ totalRows : int 62
#> .. .. ..$ totalCols : int 2001
#> .. .. ..$ rowOffset : num [1:2] 0 62
#> .. .. ..$ colOffset : num [1:2] 0 2001
#> .. .. ..$ nrow : num 62
#> .. .. ..$ ncol : num 2001
#> .. .. ..$ rowNames : NULL
#> .. .. ..$ colNames : chr [1:2001] "sex" "Hsa.3004" "Hsa.13491" "Hsa.13491.1" ...
#> .. .. ..$ type : chr "double"
#> .. .. ..$ separated : logi FALSE
#> $ n: num 62
#> $ p: num 2001
#> - attr(*, "class")= chr "processed_delim"