Theory behind HTRfit • HTRfit

RNA-sequencing is often used to determine how gene expression levels of biological samples depend on various parameters (eg. genotype, environment, age…). To determine expression changes between conditions, statistical analysis is employed to quantitatively assess whether gene expression is significantly modified in a condition compared to another. Experimental parameters such as sequencing depth, and the number of replicates, are expected to impact the statistical power of such analysis. To navigate the selection of optimal values for these experimental parameters, we introduce a comprehensive RNA-seq simulation tool, known as HTRfit. Within the simulations, biological parameters (number of genes, basal level of expression, effect sizes of explanatory variables on gene expression) can be precisely controlled to mimic the transcriptome of your organism of interest.

In addition, HTRfit provides flexible model fitting functions enabling the inclusion of fixed effects, mixed effects, and interactions in your RNAseq data analysis. To facilitate the evaluation of RNA-seq analysis tools, HTRfit offers seamless compatibility with DESeq2 outputs.

HTRfit simulation workflow

In this modeling framework, counts denoted as \(K_{ij}\) for gene i and sample j are generated using a negative binomial distribution. The negative binomial distribution considers a fitted mean \(\mu_{ij}\) and a gene-specific dispersion parameter \(alpha_i\). The fitted mean \(\mu_{ij}\) is determined by a parameter, \(q_{ij}\), which is proportionally related to the sum of all effects specified using init_variable() or add_interaction(). If basal gene expressions are provided, the \(q_{ij}\) values are scaled accordingly using the gene-specific basal expression value (\(bexpr_i\)). Furthermore, the coefficients \(\beta_i\) represent the natural logarithm fold changes for gene i across each column of the model matrix X. The dispersion parameter \(dispersion_i\) plays a crucial role in defining the relationship between the variance of observed counts and their mean value. In simpler terms, it quantifies how far we expect observed counts to deviate from the mean value for each genes. In addition, HTRfit allows for sequencing depth control using a scalar value specific to each sample (\(s_j\)) applied on the \(\mu_{ij}\) value.