Understanding what makes a set of quality indicators robust and meaningful is essential for advancing healthcare evaluation. While individual indicators can capture specific aspects of care—such as timely support during labor—they often fall short of representing the broader constructs that stakeholders care about, like overall maternity care quality. To accurately assess such complex concepts, it is necessary to develop comprehensive indicator sets guided by clear, evidence-based criteria. Yet, guidance on the properties that define a good indicator set remains limited, highlighting the need for systematic analysis and consensus in this area.
This review explores the fundamental properties that characterize high-quality indicator sets, emphasizing the importance of content validity. It also aims to identify additional criteria that should inform the development and evaluation of these sets, ultimately supporting more valid and reliable health care quality assessments.
Introduction
Health care quality indicators are tools designed to help various users—patients, clinicians, administrators, and policymakers—make informed decisions based on the quality of care delivered [1–3]. While a single indicator, such as the rate of surgical site infections, provides valuable data, it often offers a narrow view. Broader health care constructs like the overall safety or effectiveness of a service require multiple indicators that collectively capture the multidimensional nature of quality [4–10]. The conclusions drawn about such constructs depend not only on the validity of individual indicators but also on the quality of the entire indicator set as a whole [11–14].
Despite the importance of comprehensive indicator sets, existing guidance largely concentrates on the properties of individual measures—such as validity, reliability, and feasibility—without sufficiently addressing the desirable properties of the sets themselves [15–22, 13, 23]. This gap leaves developers without clear standards for constructing balanced, comprehensive, and valid indicator collections.
Applying the ‘lens model’ [24–26], indicators can be understood as ‘cues’ through which users interpret the targeted construct. If these cues are biased or incomplete, the resulting conclusions about healthcare quality may be misguided. Central to ensuring accurate interpretation is the concept of content validity, which involves confirming that the indicator set adequately reflects all relevant aspects of the construct without including irrelevant elements. Threats to content validity include omission of relevant indicators, overemphasis on some areas, or inclusion of extraneous measures [27–29], all of which can compromise the validity of assessments.
A content-valid indicator set should cover all relevant content domains, proportionally represent these domains according to their importance, and exclude irrelevant or misleading indicators. Achieving this balance is complex, and guidance on how to systematically evaluate and incorporate these properties remains sparse.
Methods
Search Strategy
A comprehensive search was conducted across four major databases—Web of Science, Medline, Cinahl, and PsycInfo—on May 21, 2021, using broad search terms related to ‘indicator sets’ without filters, to capture the widest array of relevant literature. References from included studies were also examined to identify additional relevant publications.
Eligibility Criteria
Studies were included if they discussed desirable characteristics of health care quality indicator sets, were published in peer-reviewed journals, and addressed the properties of the set as a whole. Non-English or non-German articles and studies lacking full text were excluded.
Data Extraction and Analysis
Two independent reviewers used structured coding schemes to extract data, focusing on the criteria used or recommended for high-quality indicator sets. Building upon existing frameworks, such as the definitions of content coverage, proportional representation, and contamination [28, 37], we also inductively identified additional properties from the literature. Conflicts were resolved through discussion until consensus was reached. The frequencies of various criteria and domains were tabulated to illustrate their prominence in the literature.
Results
Overview of Included Studies
Out of 558 identified articles, 62 met inclusion criteria. These studies covered a broad spectrum of healthcare constructs, including hospital care quality, primary care, mental health, and community-based maternity services [12, 40, 41, 5]. Most studies (90%) structured their targeted constructs into content domains—such as quality dimensions, policy priorities, or care pathways—to facilitate assessment of content validity [12, 45].
Addressing Content Validity in the Literature
Although only 19% of studies explicitly used the term ‘content validity,’ a substantial 85% addressed at least one of its core components: content coverage, proportional representation, or contamination [28, 37]. Only 15% of studies (9/62) addressed all three aspects comprehensively.
-
Content Coverage: 71% of studies emphasized the importance of covering all relevant content areas, with some distinguishing between the breadth (the range of domains covered) and depth (the extent of coverage within each domain) [12, 60]. Many studies defined potential content domains based on frameworks like Donabedian’s structure-process-outcome model [51, 52], although reliance solely on these measurement domains may overlook critical content aspects [13].
-
Proportional Representation: About one-third of the studies considered how indicators are distributed across different content areas relative to their importance within the construct [28, 37]. Ensuring proportional representation helps prevent overemphasis on certain dimensions at the expense of others.
-
Contamination: Half of the reviewed studies (31/62) discussed the necessity of excluding irrelevant indicators that do not accurately reflect the construct, thus avoiding contamination of the set [30, 37].
Additional Criteria Beyond Content Validity
Beyond the core components of content validity, the literature identified further substantive and procedural criteria:
- Substantive Criteria:
- Cost of Measurement: 21% of studies highlighted data collection and analysis burdens, advocating for balancing comprehensiveness with feasibility [60].
- Prioritization of Indicators: Several studies suggested selecting only the most essential measures to streamline the set without sacrificing validity [42].
- Redundancy Avoidance: Eliminating overlapping or duplicative indicators was recognized as important for clarity and efficiency [65].
-
Set Size: Striving for an optimal number—neither too many nor too few—was discussed as a means to maintain manageability while ensuring coverage [42].
-
Procedural Criteria:
- Assessment Purpose: Clarifying whether indicators serve for accountability, improvement, or research influences their development [68].
- Conceptual Framework Development: Systematic use of frameworks guides indicator selection and ensures coverage aligns with the construct [32].
- Stakeholder Involvement: Engaging clinicians, patients, and policymakers enhances content validity by incorporating diverse perspectives [69].
- Transparency: Clear documentation of development processes and limitations fosters trustworthiness and reproducibility [45].
Discussion
Principal Findings
Most studies (85%) recognize the importance of aspects related to content validity, but only a small minority (15%) address all core components comprehensively. The distinction between the breadth and depth of content coverage emerged as a key consideration; both are necessary for a valid set. The use of content domains—whether based on quality dimensions or care pathways—serves as a practical framework to structure indicator sets and assess validity [12, 45].
Additional criteria such as measurement costs, indicator prioritization, and stakeholder involvement further refine indicator set quality. Procedural properties—like clarity of purpose, framework use, and transparency—are equally critical to ensure the set’s validity and utility [68, 69].
Strengths and Limitations
This review offers a comprehensive overview of criteria used across diverse studies, providing a valuable guide for developers. Its systematic approach ensures credibility, though limitations include focus solely on peer-reviewed literature and potential omission of grey literature. Coding subjectivity was minimized through independent review and consensus procedures.
Implications for Practice and Research
Developers should prioritize content validity by carefully defining the construct and its content domains, ensuring balanced coverage and avoiding irrelevant indicators. Explicitly articulating the purpose of measurement and involving stakeholders enhances relevance and acceptance. Future research should explore how different measurement objectives influence content domain selection and how to best balance comprehensiveness with feasibility.
Conclusions
A set of valid indicators must reflect the complexity of healthcare quality without overburdening data collection processes. Ensuring content validity—through comprehensive coverage, proportional representation, and exclusion of irrelevant indicators—is paramount. Adopting systematic, transparent development procedures and clarifying measurement objectives will improve the utility and trustworthiness of indicator sets, ultimately supporting more accurate assessments and better healthcare outcomes.
Learn more about the structure of the Canadian healthcare system and discover what makes the US healthcare system notably large. For those interested in health data careers, guidance on how to become a healthcare data analyst can be valuable. Additionally, insights into emerging technologies reveal how AI advances healthcare.
