Identify top or low fitting observations based on specified diagnostic metric and filtering method.
identifyTopFit.Rd
This function identifies top or low fitting observations based on a specified metric and filtering method.
Usage
identifyTopFit(
list_tmb,
metric = "AIC",
filter_method = "mad",
keep = "top",
sort = F,
decreasing = T,
mad_tolerance = 3
)
Arguments
- list_tmb
List of glmmTMB objects.
- metric
The metric used for diagnostic (e.g., "AIC", "BIC", "logLik", "deviance", "dispersion").
- filter_method
The filtering method to be used (e.g., "mad"). Feel free to implement your own filetering method
- keep
Whether to keep "top" or "low" fitting observations.
- sort
Logical indicating whether to sort the results.
- decreasing
Logical indicating whether to sort in decreasing order.
- mad_tolerance
Tolerance for MAD-based filtering.
Examples
input_var_list <- init_variable()
#> Variable name should not contain digits, spaces, or special characters.
#> If any of these are present, they will be removed from the variable name.
## -- simulate RNAseq data
mock_data <- mock_rnaseq(input_var_list,
n_genes = 5,
min_replicates = 3,
max_replicates = 3,
basal_expression = 2,
sequencing_depth = 1e5)
#> Building mu_ij matrix
#> INFO: The length of the sequencing_depth vector is shorter than the number of samples. Values will be recycled.
#> Scaling count table according to sequencing depth: Done
#> INFO: Scaling counts by sequencing depth may exhibit some randomness due to certain parameter combinations, resulting in erratic behavior. This can be minimized by simulating more genes. We advise verifying the simulated sequencing depth to avoid drawing incorrect conclusions.
#> k_ij ~ Nbinom(mu_ij, dispersion)
#> Counts simulation: Done
## -- prepare data & fit a model with mixed effect
data2fit = prepareData2fit(countMatrix = mock_data$counts,
metadata = mock_data$metadata)
l_tmb <- fitModelParallel(formula = kij ~ myVariable, data = data2fit,
group_by = "geneID", family = glmmTMB::nbinom2(link = "log"),
n.cores = 1)
#> Log file location: /tmp/RtmpS86cq0/htrfit.log
#> CPU(s) number : 1
#> Cluster type : PSOCK
# Identify top fitting observations based on AIC with MAD filtering
identifyTopFit(l_tmb, metric = "AIC", filter_method = "mad", keep = "top",
sort = TRUE, decreasing = TRUE, mad_tolerance = 3)
#> Based on the specified metric (AIC) and the MAD filtering method, the following selection criteria were applied:
#> 1. The MAD-based threshold for considering outliers was calculated.
#> 2. Values above the threshold were identified, threshold: 87.2150684926346
#> 3. Summary of selection:
#> - 4 out of 5 observations had values above the threshold for the AIC metric.
#> [1] "gene5" "gene2" "gene3" "gene1"
# Identify low fitting observations based on BIC without sorting
identifyTopFit(l_tmb, metric = "BIC", filter_method = "mad", keep = "low", sort = FALSE)
#> Based on the specified metric (BIC) and the MAD filtering method, the following selection criteria were applied:
#> 1. The MAD-based threshold for considering outliers was calculated.
#> 2. Values bellow the threshold were identified, threshold: 86.5903469003188
#> 3. Summary of selection:
#> - 1 out of 5 observations had values bellow the threshold for the BIC metric.
#> [1] "gene4"
# Identify top fitting observations based on log-likelihood with MAD filtering and custom tolerance
identifyTopFit(l_tmb, metric = "logLik", filter_method = "mad", keep = "top", mad_tolerance = 2)
#> Based on the specified metric (logLik) and the MAD filtering method, the following selection criteria were applied:
#> 1. The MAD-based threshold for considering outliers was calculated.
#> 2. Values above the threshold were identified, threshold: -49.4817291872032
#> 3. Summary of selection:
#> - 5 out of 5 observations had values above the threshold for the logLik metric.
#> [1] "gene1" "gene2" "gene3" "gene4" "gene5"