Theory behind HTRfit
01-theoryBehindHtrfit.Rmd
RNA-sequencing is often used to determine how gene expression levels of biological samples depend on various parameters (eg. genotype, environment, age…). To determine expression changes between conditions, statistical analysis is employed to quantitatively assess whether gene expression is significantly modified in a condition compared to another. Experimental parameters such as sequencing depth, and the number of replicates, are expected to impact the statistical power of such analysis. To navigate the selection of optimal values for these experimental parameters, we introduce a comprehensive RNA-seq simulation tool, known as HTRfit. Within the simulations, biological parameters (number of genes, basal level of expression, effect sizes of explanatory variables on gene expression) can be precisely controlled to mimic the transcriptome of your organism of interest.
In addition, HTRfit provides flexible model fitting functions enabling the inclusion of fixed effects, mixed effects, and interactions in your RNAseq data analysis. To facilitate the evaluation of RNA-seq analysis tools, HTRfit offers seamless compatibility with DESeq2 outputs.
HTRfit simulation workflow
In this modeling framework, counts denoted as \(K_{ij}\) for gene i and sample j are
generated using a negative binomial distribution. The negative binomial
distribution considers a fitted mean \(\mu_{ij}\) and a gene-specific dispersion
parameter \(alpha_i\). The fitted mean
\(\mu_{ij}\) is determined by a
parameter, \(q_{ij}\), which is
proportionally related to the sum of all effects specified using
init_variable()
or add_interaction()
. If basal
gene expressions are provided, the \(q_{ij}\) values are scaled accordingly
using the gene-specific basal expression value (\(bexpr_i\)). Furthermore, the coefficients
\(\beta_i\) represent the natural
logarithm fold changes for gene i across each column of the model matrix
X. The dispersion parameter \(dispersion_i\) plays a crucial role in
defining the relationship between the variance of observed counts and
their mean value. In simpler terms, it quantifies how far we expect
observed counts to deviate from the mean value for each genes. In
addition, HTRfit allows for sequencing depth control using a scalar
value specific to each sample (\(s_j\))
applied on the \(\mu_{ij}\) value.