How SeniorIndex collects, processes, and publishes data on the U.S. senior care landscape — covering skilled nursing facilities and home health agencies — sources, composite scores, data vintage, and known limitations.
SeniorIndex is an independent research platform. Every figure we publish is derived from public federal or state data sources, processed through a documented pipeline, and made available through research (free) and reports (paid). This page describes how that work is done.
Our operating principle is that the data is public but the interpretation is where the work happens. We don't generate the underlying data — we aggregate it, reconcile inconsistencies across datasets, compute benchmarks and composite scores from documented formulas, and publish the results. Every step in that chain is described below.
This methodology page describes the platform as a whole. Individual reports carry their own methodology sections that name the specific data layers drawn from, the refresh dates of each layer, and any report-specific calculations. We do not hide our math.
Data is ingested directly from primary sources on documented schedules. Quarterly schedules drive the PBJ staffing data refresh. Monthly schedules drive the regulatory and enforcement data refresh. Annual schedules drive the HCRIS financials and Census demographics refresh. State regulatory data is refreshed on change when new statutes, enforcement actions, or penalty structures are published.
Identifiers are normalized across datasets. The CMS Certification Number (CCN) is the primary key for facility-level joins. Where other identifiers appear in source data — provider numbers, state license IDs, ownership entity names — they are cross-referenced against the CCN-keyed facility master to produce a single unified record per facility.
Reporting inconsistencies are reconciled, not discarded. Source datasets occasionally disagree on facility status, bed counts, or ownership attribution between refresh cycles. When discrepancies are detected, we retain the most recent authoritative value and flag the conflict in our internal records. Publication-facing figures always use the most recent reconciled values.
Composite scores are computed from documented formulas. Scores are derived in a deterministic processing step after ingestion. The formulas are published in the methodology section of every report that uses them.
A 0–100 composite measuring staffing reliability across six sub-metrics: staffing volatility, weekend drop-off, zero-RN days, coverage gap against industry benchmarks, contract dependency, and turnover velocity. Used to assess whether a facility's staffing pattern is stable over time regardless of whether total hours meet benchmark levels.
A 0–100 composite grading all 12,251 certified home health agencies on three weighted dimensions: Quality of Care (50% — quality star rating, hospitalization performance, Medicare spending efficiency), Patient Experience (30% — HHCAHPS summary star blended with recommend percentage, confidence-adjusted for low survey volume), and Value-Based Purchasing (20% — CMS Total Performance Score, with national median assigned to non-participants). Grade scale runs A+ through F.
A 0–100 composite designed for staffing-agency use, measuring the combination of gap size, revenue quality, regulatory pressure, facility size, and violation signal. A facility can score stable (Stability Score) while simultaneously scoring as a high-opportunity sales target (Opportunity Score). Both scores are published together to avoid misreading one as the other.
Every report we publish includes a data vintage box identifying the refresh date of each data layer used in that report. This is a non-optional element of our report templates. Readers never have to guess whether they are looking at current figures or stale ones.
The vintage reflects when our pipeline last ingested each source, not when the source itself was published. In most cases these are close, but some sources (annual HCRIS filings in particular) are published on lagged schedules that we name explicitly when relevant.
CMS privacy rules suppress RN and total staff turnover rates for facilities where the sample size falls below their disclosure threshold. Approximately 2,031 facilities (roughly 14% of the universe) currently have suppressed turnover data. Our reports flag these facilities as "turnover not available" rather than excluding them from analysis or imputing values.
The Payroll-Based Journal publishes one row per facility per day, aggregating all shifts within a day. Within-shift patterns — night coverage relative to day, meal-time coverage, hour-by-hour patterns — are not captured in the source data and therefore cannot be analyzed at that granularity. Our zero-RN-day and weekend-drop-off metrics operate at daily resolution, which is the finest resolution the source permits.
CMS ownership history records changes in ownership entities but does not label the transaction type — sale, merger, lease, internal restructuring, and successor entity changes all appear as generic ownership changes. We report ownership changes as observed and do not infer transaction categories from the raw records. Buyers relying on transaction-type filtering should corroborate with primary deal documentation.
Payer mix, revenue composition, and operating metrics are derived from annual cost report filings. A facility's payer mix as we report it reflects the most recent full fiscal year filed, not the current operational state. Facilities that have materially shifted their payer mix within the current year will not show that shift in our data until the next annual filing is published.
County-level population, age distribution, and income figures are drawn from the American Community Survey 5-year estimates, which by construction lag the current year by several years and smooth short-term demographic shifts. These figures are appropriate for structural market analysis but should not be read as real-time population counts.
The federal CMS datasets cover all certified nursing homes uniformly, but the state-level regulatory data — statutes, enforcement mechanisms, penalty schedules, state-specific licensure — varies in how thoroughly each state publishes it. Our state regulatory coverage is deepest for states with active public health reporting and more limited for states that publish sparsely. We name the specific state sources used in each state report.