Word enrichment histograms
--------------------------
The histogram files have the following fields:
- MidValueBin: the middle value of the SDnor (clustering level) bin
- HigherRatioWords: Number of DNA words in this bin which are at least twice as frequent within the reference as expected (ratio >= 2)
- Percentage: percentage of 'HigherRatioWords'
- meanRatio: the mean value of all ratios (each word has its own ratio which indicates its degree of association towards the reference)
- stdRatio: the standard deviation of the ratios
- min: the minimum ratio observed in this bin
- max: the maximum ratio observed in this bin
- p5: the ratio corresponding to percentile 5 of the ratio distribution
- p95: the ratio corresponding to percentile 95 of the ratio distribution
- totRatio: the fraction of word copies found within the reference divided by the fraction of word copies located outside the reference. This measure is calculated summing over all words and not like before treating each word separately.
- nrWordsInEnt: The total number word copies within the entity. Please note that this number will be the same for all bins except the last one, in which the overlap is included.