Methodology — SeniorIndex.ai

Overview

A standing methodology, openly documented

SeniorIndex is an independent research platform. Every figure we publish is derived from public federal or state data sources, processed through a documented pipeline, and made available through research (free) and reports (paid). This page describes how that work is done.

Our operating principle is that the data is public but the interpretation is where the work happens. We don't generate the underlying data — we aggregate it, reconcile inconsistencies across datasets, compute benchmarks and composite scores from documented formulas, and publish the results. Every step in that chain is described below.

This methodology page describes the platform as a whole. Individual reports carry their own methodology sections that name the specific data layers drawn from, the refresh dates of each layer, and any report-specific calculations. We do not hide our math.

Data Sources

Primary federal and state sources

Every figure we publish traces back to one of the sources below. We ingest directly from primary publishers rather than resold or aggregated feeds. Each source contributes a specific layer to the integrated dataset.

CMS Payroll-Based Journal (PBJ)

Daily staffing records for every certified nursing facility, split by employee and contract hours across RN, LPN, CNA, and other nursing categories. Mandatory reporting under Section 6106 of the Affordable Care Act.

Refreshed quarterly · 2017-present

CMS Provider Data Catalog

Facility-level certification, ratings, bed counts, occupancy, ownership type, chain affiliation, geographic coordinates, and Special Focus Facility status for all certified nursing homes.

Refreshed monthly · current snapshot

CMS Enforcement & Penalties

Health inspection deficiencies with severity classification, immediate jeopardy and actual harm flags, survey dates, civil monetary penalties, and payment denial actions.

Refreshed monthly · 2017-present

HCRIS Cost Reports

Facility-level annual financial filings including revenue by payer source (Medicare, Medicaid, private pay), operating expenses, and occupancy metrics from the Healthcare Cost Report Information System.

Refreshed annually · most recent FY available

U.S. Census ACS

County-level demographic overlays including population by age group, median income, household composition, and 5-year projections drawn from the American Community Survey.

Refreshed annually · ACS 5-year estimates

State Regulatory Sources

State-specific staffing minimums, enforcement mechanisms, penalty structures, and licensure data. Sourced directly from state statutes, department of health publications, and state Medicaid offices.

Refreshed on change · 50 states + DC

Home Health Data Sources

CMS HH Agency Quality & Measures

Agency-level quality star ratings, clinical outcome measures, hospitalization performance categories, Medicare spending efficiency ratios, and certification metadata for all 12,251 certified home health agencies.

PDC dataset 6jpm-sxkc · Refreshed monthly

HHCAHPS Patient Survey

Patient experience results including summary star rating, recommend percentage, and four domain scores: Professional Care, Communication, Medication Safety, and Overall Rating. Confidence-adjusted for agencies below 50 completed surveys.

PDC dataset ccn4-8vby · Refreshed quarterly

HH Value-Based Purchasing (HHVBP)

Total Performance Score, payment adjustment percentage, performance year, cohort assignment, and six care point domain scores for the 7,212 agencies participating in the HHVBP program. Non-participants receive national median score.

PDC dataset 56d7-4994 · Refreshed annually

HCRIS HHA Cost Reports

Agency-level annual financial filings including net revenue, total costs, Medicare revenue percentage, and visit counts by discipline (Skilled Nursing, Physical Therapy, Occupational Therapy). Cost per visit derived from NMRC S300000 worksheet.

HHA20FY2024.ZIP · Refreshed annually

HH ZIP Code Service Areas

549,027 agency-ZIP pairs mapping the geographic service footprint of every certified home health agency. Used to identify service area expansion signals and territory overlap for staffing agency prospecting.

PDC dataset m5eg-upu5 · Refreshed monthly

HH National & State Benchmarks

CMS-published benchmark values for quality measures and patient experience metrics at national and state levels. Used to contextualize agency performance relative to peer groups.

PDC datasets 97z8-de96 · tee5-ixt5 · Refreshed quarterly

Collection & Processing

How data becomes a usable dataset

Data is ingested directly from primary sources on documented schedules. Quarterly schedules drive the PBJ staffing data refresh. Monthly schedules drive the regulatory and enforcement data refresh. Annual schedules drive the HCRIS financials and Census demographics refresh. State regulatory data is refreshed on change when new statutes, enforcement actions, or penalty structures are published.

Identifiers are normalized across datasets. The CMS Certification Number (CCN) is the primary key for facility-level joins. Where other identifiers appear in source data — provider numbers, state license IDs, ownership entity names — they are cross-referenced against the CCN-keyed facility master to produce a single unified record per facility.

Reporting inconsistencies are reconciled, not discarded. Source datasets occasionally disagree on facility status, bed counts, or ownership attribution between refresh cycles. When discrepancies are detected, we retain the most recent authoritative value and flag the conflict in our internal records. Publication-facing figures always use the most recent reconciled values.

Composite scores are computed from documented formulas. Scores are derived in a deterministic processing step after ingestion. The formulas are published in the methodology section of every report that uses them.

Composite Scores

Our proprietary scores, openly constructed

Composite scores are derived from weighted sub-metrics with published formulas. We publish both scores alongside one another rather than a single headline number, because facilities often look different depending on which question is being asked.

Staffing Stability Score

A 0–100 composite measuring staffing reliability across six sub-metrics: staffing volatility, weekend drop-off, zero-RN days, coverage gap against industry benchmarks, contract dependency, and turnover velocity. Used to assess whether a facility's staffing pattern is stable over time regardless of whether total hours meet benchmark levels.

6 sub-metrics · weighted composite · full formula in report methodology

SeniorIndex Agency Score (HHA)

A 0–100 composite grading all 12,251 certified home health agencies on three weighted dimensions: Quality of Care (50% — quality star rating, hospitalization performance, Medicare spending efficiency), Patient Experience (30% — HHCAHPS summary star blended with recommend percentage, confidence-adjusted for low survey volume), and Value-Based Purchasing (20% — CMS Total Performance Score, with national median assigned to non-participants). Grade scale runs A+ through F.

3 dimensions · weighted composite · A+/A/A-/B+/B/B-/C+/C/C-/D/F grade scale

Staffing Opportunity Score

A 0–100 composite designed for staffing-agency use, measuring the combination of gap size, revenue quality, regulatory pressure, facility size, and violation signal. A facility can score stable (Stability Score) while simultaneously scoring as a high-opportunity sales target (Opportunity Score). Both scores are published together to avoid misreading one as the other.

5 components · LOW / MOD / HIGH / URGENT tiers · full formula in report methodology

Data Vintage

Every publication names its refresh dates

Every report we publish includes a data vintage box identifying the refresh date of each data layer used in that report. This is a non-optional element of our report templates. Readers never have to guess whether they are looking at current figures or stale ones.

The vintage reflects when our pipeline last ingested each source, not when the source itself was published. In most cases these are close, but some sources (annual HCRIS filings in particular) are published on lagged schedules that we name explicitly when relevant.

Known Limitations

What our data does and does not cover

Every dataset has structural limitations inherited from its source. The items below describe the significant ones we have identified in the data we publish. We consider naming these limitations a prerequisite for calling the work research.

CMS suppresses turnover data for a portion of facilities

CMS privacy rules suppress RN and total staff turnover rates for facilities where the sample size falls below their disclosure threshold. Approximately 2,031 facilities (roughly 14% of the universe) currently have suppressed turnover data. Our reports flag these facilities as "turnover not available" rather than excluding them from analysis or imputing values.

PBJ staffing data is daily, not shift-level

The Payroll-Based Journal publishes one row per facility per day, aggregating all shifts within a day. Within-shift patterns — night coverage relative to day, meal-time coverage, hour-by-hour patterns — are not captured in the source data and therefore cannot be analyzed at that granularity. Our zero-RN-day and weekend-drop-off metrics operate at daily resolution, which is the finest resolution the source permits.

Ownership transaction types are unlabeled

CMS ownership history records changes in ownership entities but does not label the transaction type — sale, merger, lease, internal restructuring, and successor entity changes all appear as generic ownership changes. We report ownership changes as observed and do not infer transaction categories from the raw records. Buyers relying on transaction-type filtering should corroborate with primary deal documentation.

HCRIS financials are annual, not point-in-time

Payer mix, revenue composition, and operating metrics are derived from annual cost report filings. A facility's payer mix as we report it reflects the most recent full fiscal year filed, not the current operational state. Facilities that have materially shifted their payer mix within the current year will not show that shift in our data until the next annual filing is published.

Census ACS estimates lag current demographics

County-level population, age distribution, and income figures are drawn from the American Community Survey 5-year estimates, which by construction lag the current year by several years and smooth short-term demographic shifts. These figures are appropriate for structural market analysis but should not be read as real-time population counts.

State-level data completeness varies across states

The federal CMS datasets cover all certified nursing homes uniformly, but the state-level regulatory data — statutes, enforcement mechanisms, penalty schedules, state-specific licensure — varies in how thoroughly each state publishes it. Our state regulatory coverage is deepest for states with active public health reporting and more limited for states that publish sparsely. We name the specific state sources used in each state report.

Updates

Version history

When our methodology changes materially — new data sources added, scoring formulas revised, or limitations discovered — this page is updated and the change is noted below.

Current version: 1.1 · Updated April 2026 · Added home health agency data sources and SeniorIndex Agency Score