Theory behind HTRfit
01-theoryBehindHtrfit.Rmd
In the realm of RNA-seq analysis, various key experimental parameters play a crucial role in influencing the statistical power to detect expression changes. Parameters such as sequencing depth, the number of replicates, and others are expected to impact statistical power. To navigate the selection of optimal values for these experimental parameters, we introduce a comprehensive statistical framework known as HTRfit, underpinned by computational simulation. Moreover, HTRfit offers seamless compatibility with DESeq2 outputs, facilitating a comprehensive evaluation of RNA-seq analysis.
HTRfit simulation workflow
In this modeling framework, counts denoted as \(K_{ij}\) for gene i and sample j are
generated using a negative binomial distribution. The negative binomial
distribution considers a fitted mean \(\mu_{ij}\) and a gene-specific dispersion
parameter \(dispersion_i\). The fitted
mean \(\mu_{ij}\) is determined by a
parameter, \(q_{ij}\), which is
proportionally related to the sum of all effects specified using
init_variable()
or add_interaction()
. If basal
gene expressions are provided, the \(q_{ij}\) values are scaled accordingly
using the gene-specific basal expression value (\(bexpr_i\)). Furthermore, the coefficients
\(\beta_i\) represent the natural
logarithm fold changes for gene i across each column of the model matrix
X. The dispersion parameter \(dispersion_i\) plays a crucial role in
defining the relationship between the variance of observed counts and
their mean value. In simpler terms, it quantifies how far we expect
observed counts to deviate from the mean value for each genes. In
addition, HTRfit allows for sequencing depth control using a scalar
value specific to each sample (\(s_j\))
applied on the \(\mu_{ij}\) value.