presented at ABiC 2014
31 October 2014, from Nesoni version 0.128:
fitnoise.R is distributed as part of Nesoni, but also works as a standalone R+ module.
A unique feature of Fitnoise is its ability to detect differential poly(A) tail length in PAT-Seq data. See also Tail Tools.
Say you have a dgelist, mydgelist, and a design matrix mydesign, and a set of coefficients, testcoefs, to test:
H1 is all columns in the design matrix. H0 is those columns not named in testcoefs.
Run R+ with:
source("fitnoise.R") # voom from limma to calculate weights myelist <- voom(mydgelist, mydesign) myfit <- fit.elist(myelist, mydesign, model=model.t.standard, cores=8) #examine noise fit myfit #produce a toptable-like result data-frame result <- test.fit(myfit, coefs=testcoefs)
When examining myfit, if the "noise combined p-value" is low this indicates a poor fit by the noise model. A more accurate noise model may be required.
The novel feature of Fitnoise is the availability of different noise models. Above we used model.t.standard, which performs moderated t tests or F tests similarly to limma. Available are:
- model.t.standard, attempts to closely emulate limma. The weights matrix is used if present in the EList.
- model.t.independent, which simply performs independent t tests or F tests.
- model.normal.standard, which performs z tests or chi-square tests. The weights matrix is used if present in the EList.
- model.t.patseq and model.normal.patseq, for PAT-Seq poly(A) tail length data.
- Experiemental: model.t.per.sample.var and model.normal.per.sample.var fit per-sample variances, similar to arrayWeights in limma. The weights matrix is used if present in the EList. Would suggest only using this with an absolute minimum of three samples per group, or with negative controls (see below).
Writing new noise models is straightforward.
Fitnoise handles missing data gracefully.
Use without replicatesSay mydesign represents a null hypothesis H0. We seek genes that are outliers given a noise model. t-distribution based models allow for outliers, so we instead use a normal-distribution based model. This is effectively asserting that genes have fixed biological variability. This is a poor assertion to make.
myfit <- fit.elist(myelist, mydesign, model=model.normal.standard, cores=16) #produce a toptable-like result data-frame based on surprising deviation #from the noise model in individual genes result <- test.fit(myfit)
Note how we never specified an alternative hypothesis H1.
Equivalently, you can call fit.elist with H1 as the design and an extra parameter "noise.design" giving H0, then call test.fit normally. This will give identical p-values, and the output will include coefficient estimates.
This produces a ranking of genes by interest, taking into account uncertainty in their expression levels. p-values / FDR-values are for ranking purposes only. This method can not distinguish highly variable genes from truly differentially expressed genes. Now go and tell your collaborator to produce replicates next time.
Experimental: negative control features
A negative control feature is one in which the noise distribution is typical, and which is believed to not have been changed by the experiment. Constitutively expressed housekeeping genes would make good negative controls. Spiked in RNA would not make good negative controls as they will not have typical biological noise. Empirically chosen negative controls might underestimate or misestimate the noise distribution, use with caution (eg features chosen from non-significant features from a differential test without negative controls).
A logical vector of negative control features can be passed to fit.elist as parameter "controls". You can also pass your null hypothesis design as "control.design" (this defaults to a single column of ones).
Negative controls, if known, allow noise to be fitted more accurately.
For example they may be useful for model.t.per.sample.var if you have few samples.
If the noise model is covariant between samples
with covariance spanning experimental groups,
negative controls are absolutely necessary
(none of the current noise models use this.)