Traditional p-values can be tiny even when an association is entirely typical of the domain’s background dependence. Broad, power-law spectra spread shared variance across many dimensions, so generic adjustment doesn’t erase crud and larger N just measures it more precisely. The implication is a hard limit for association-only causal claims: when signals are comparable to the crud scale, no statistical threshold can reliably separate signal from background.
| Classical | Crud-aware | |
|---|---|---|
| False positive vs crud background | — | — |
| False negative rate (miss μ) | — | — |
| Power for target effect | — | — |
Paul Meehl called it the crud factor: in observational data, everything is weakly linked to everything else. In neuro data, shared physiology and motion correlate voxels; in psych, broad traits tie questionnaire items together; in medical panels, systemic physiology links labs; in omics, pathways and batch effects connect genes.
This isn't just measurement error — it's real, background structure. The scale varies by domain and preprocessing. The core point is not a specific number, but that the background is broad and nonzero.
Generic adjustment removes broad shared variation. The paper defines the crud scale after that adjustment as:
$$\sigma_{\text{crud}} = \frac{\sqrt{\sum \lambda_{\text{tail}}^2}}{\sum \lambda_{\text{tail}}}$$When eigenvalues follow a power law (\(\lambda_k \propto k^{-\alpha}\)), \(\sigma_{\text{crud}}\) shrinks slowly. Removing broad components does not collapse background dependence.
Instead of testing \(H_0\!: \rho = 0\), define a background null. In the parametric version the null is:
$$H_0:\ z_{\text{true}} \sim \mathcal{N}(0,\sigma_{\text{crud}}^2), \quad z=\operatorname{atanh}(r)$$That is, the true association for a random pair is drawn from the domain’s background distribution. With large data you can estimate this empirically; here we show the parametric version on the Fisher \(z\) scale:
$$z_{\text{crud}} = \frac{|\!\operatorname{atanh}(r)|}{\sqrt{\sigma_{\text{crud}}^2 + 1/(n-3)}}$$The resulting crud-aware p-value is a two-sided test against the domain's background distribution. It controls Type-I error under the null that the target pair's true association is drawn from the crud distribution.
The sampling term \(1/(n-3)\) shrinks with more data, but \(\sigma_{\text{crud}}\) does not. A rough crossover point is:
$$n^* \approx \frac{1}{\sigma_{\text{crud}}^2} + 3$$(here: —)
Below \(n^*\), sampling noise dominates. Above \(n^*\), you mostly measure background dependence more precisely.
The paper proves a decision-theoretic bound: if the causal signal size \(\mu\) is comparable to \(\sigma_{\text{crud}}\), no rule that uses only the association statistic can reliably separate direct causal links from background dependence.
$$\text{error}^* = \Phi\!\left(-\frac{|\mu|}{2\sigma_{\text{crud}}}\right)$$Here \(\Phi\) is the standard normal cumulative distribution function (the area under the normal curve to the left of a value).
Design leverage (randomization, instruments, discontinuities, negative controls) can overcome this because it changes the question, not just the threshold.