Not only RNAseq
07-rnaseq_notOnly.Rmd
In the realm of RNAseq analysis, it’s widely acknowledged that counts
follow a Negative Binomial distribution. Hence, we recommend employing
family = glmmTMB::nbinom2(link = "log")
in the parameters
of fitModelParallel()
to align with this distribution. In
RNAseq it is well known that counts have a Negative binomial
distribution that’s why we recommand using
family = glmmTMB::nbinom2(link = "log")
in
fitModelParallel()
parameters.
pub_data_fn <- system.file("extdata", "pub_data_rna.tsv", package = "HTRfit")
pub_data <- read.table(file = pub_data_fn, header = TRUE)
plot(density(unlist(round(pub_data[,-1]))), log='x',
main = "Density of RNAseq counts")
While HTRfit is optimized for RNA-seq data, it’s worth noting that the model family can be customized to accommodate other data types. In a different context, the model family can be adapted according to the nature of your response data. To illustrate, consider an attempt to model Iris sepal length based on sepal width using the iris dataframe. Evaluating the distribution shape of the variable to be modeled (Sepal.Width), we observe a normal distribution.
Given this observation, we opt to fit a model for each species using a Gaussian family model (default). Details about model family here.
l_tmb_iris <- fitModelParallel(
formula = Sepal.Width ~ Sepal.Length ,
data = iris,
group_by = "Species",
family = stats::gaussian(),
n.cores = 1)