About FilingDrift

What this is

FilingDrift is a language-change scoring tool for SEC 10-K annual filings. It answers one question: does this year's filing say something meaningfully different from last year's — and if so, is that unusual compared to peers?

We built it because corporate distress has a pre-crisis signature in language. CFOs don't suddenly say "we're in trouble" — they gradually introduce hedging language, new risk factor categories, and liquidity disclosures that weren't there before. The change is subtle. The accumulation is not.

SVB's 2022 10-K scored 58.4 on our scale. The 95th percentile of healthy companies in our corpus is 49.9. The FDIC arrived 14 days after filing.

What this is not

  • Not investment advice. We score language. We don't predict price movements, recommend positions, or guarantee outcomes.
  • Not a crystal ball. 30 of 41 tracked crisis companies with sufficient filing history exceeded the control ceiling before their collapse. The rest did not — PCG, CFC, and SI were missed. FilingDrift is a screening tool, not a prediction one.
  • Not a financial advisory service. Latent Systems SAS is a software company. We are not registered investment advisors, broker-dealers, or credit rating agencies.
  • Not affiliated with the SEC. We use publicly available data from EDGAR. We have no relationship with the SEC or any regulatory body.
  • Not proprietary or insider information. Every score is derived entirely from public SEC filings available on EDGAR. We have no access to non-public information about any company, and no score reflects anything that was not already in the public record at the time of filing.

How the score works

The distress score measures two things independently:

  • What's new or escalating. Phrases that appeared for the first time, or dramatically increased, weighted by how rarely other companies use them. A phrase that only SVB was saying in 2022, that it hadn't mentioned before, scores higher than boilerplate that every bank uses.
  • Where the language is heading. We measure how the current year's language has shifted from last year, in the sections most predictive of distress — risk factors, liquidity disclosures, MD&A. This catches the drift that keyword lists miss.

Both are calibrated against the healthy companies in our corpus — currently 4906+ tracked. The 95th percentile of their filing-pair scores is the control ceiling (49.9). Scores above it are flagged.

The algorithm is deterministic: no AI generation, no prompting, no summarization. The same filing always produces the same score.

See the full FAQ →

Forward-return backtest — full corpus: 4923 companies, 7069 flag events

For every flag event in the corpus, we measured stock returns at 12, 24, and 36 months vs. SPY — excluding the 2007–2011 macro crisis period. The underperformance compounds over time.

After flag N events Median alpha vs. S&P 500 IQR (alpha) % underperforming market
1 year 7069 −8.6% -32.5% to 15.1% 58%
2 years 6597 −14.8% -50.6% to 23.6% 61%
3 years 6059 −22.4% -63.8% to 26.3% 63%

Alpha = company return minus SPY return over the same period. 58% of flagged events have negative alpha vs. ~50% expected by chance. IQR shows the middle 50% of outcomes (25th–75th percentile) — the distribution is wide, as expected for a prioritization signal, not a trading rule. N decreases at longer horizons because events flagged after 2023–2024 do not yet have complete forward windows.

Caveats: delisted tickers use last available price as terminal value (understates losses for bankruptcies); 2007–2011 crisis era excluded to avoid macro distortion; no adjustment for market-period clustering.

Crisis & distress detection — labeled set: ~43 hand-selected crisis companies

The table below shows how the score performed against every labeled crisis company in our corpus. Events include bankruptcies, bank failures, FDIC seizures, and Chapter 11 filings (some companies subsequently emerged).

73%
Recall — 30/41 crisis companies detected
(≥2 pre-event filing pairs)
81%
Precision — 30 crisis / 37 flagged
(labeled set: 29 control + crisis companies)
Company Event Peak score Lead time Result
PRTY (Party City) Bankruptcy 2023 105.9 3.1 years Detected
NKLA (Nikola) Bankruptcy 2023 85.8 3.7 years Detected
BBBY (Bed Bath) Bankruptcy 2023 150.7 2.0 years Detected
RITEAID Bankruptcy 2023 81.4 167 days Detected
SVB Financial Bank collapse 2023 58.4 14 days Detected
SI (Silvergate) Liquidation 2023 15.5 Missed
CFC, REVLON, PCG, CHKAQ Various <6 No data †

† CFC (Countrywide, 2008), REVLON, PCG (PG&E, 2019), and CHKAQ (Chesapeake, 2020) have 0–1 filing pairs in our corpus — insufficient history to compute a meaningful drift score. We count them as misses to avoid cherry-picking. The score requires at least two consecutive filings to measure change.

False positives: 6 of 30 stable reference companies generated above-ceiling scores at some point. Four of the six occurred during the COVID disruption years (2020–2021), when market-wide language shifts reduced the specificity of peer normalization. One (RTX) followed a major corporate merger that produced large language changes for structural reasons.

📈
Corpus is actively expanding
Currently tracking 4906+ companies. We ingest new filings continuously as they appear on EDGAR. Coverage, recall, and statistical power improve with each new filing cycle. Control ceiling: 49.9.

Known limitations

  • M&A distortions. When a company acquires another and consolidates filings, the combined entity may show language change that reflects the target's pre-existing disclosure style, not a genuine deterioration.
  • Sector shocks. In years with industry-wide stress (2008–2009, 2020), nearly all peers spike together. Relative scoring is less informative when the entire peer group is distressed.
  • Regulatory language. Banks under formal regulatory agreements (cease-and-desist, memoranda of understanding) are required to use specific disclosure language. This language reads as distressed but may not indicate approaching collapse.
  • Binomial false-flag rate. With 4906+ companies tracked over multiple years, some will exceed the ceiling by random variation. Companies with 10+ years of history have more opportunities to spike. We are working on potential solutions, e.g. adaptive thresholds.

Who we are

FilingDrift is built by Latent Systems, a small team of ML researchers based in Paris. We all have PhDs in machine learning. Our research focuses on training embedding models and studying the geometry of the spaces they produce: how meaning is encoded in high-dimensional representations, and what structural properties of those spaces can be exploited for detection, classification, and anomaly scoring.

FilingDrift grew out of that work. The question was whether financial distress leaves a detectable signature in the geometry of how a company writes about itself over time, and whether that signature appears before prices move. The signal here — cross-sectional peer normalization applied to sentence-embedding drift — is a direct application of our research.

We are not a hedge fund, a financial services firm, or a consultancy. FilingDrift is a research product of an independent research company.

Questions, feedback, and enterprise inquiries: hello@filingdrift.com

Browse examples → Read the FAQ

This site uses a session cookie for authentication. We also use Plausible Analytics, a privacy-friendly, cookieless tool that collects no personal data and requires no consent under GDPR. See our Privacy Policy.