The depression inventory development scale: Assessment of psychometric properties using classical and modern measurement theory in a can-bind trial

Anthony L. Vaccarino; Amir H. Kalali; Pierre Blier; Susan Gilbert Evans; Nina Engelhardt; Jane A. Foster; Benicio N. Frey; John H. Greist; Kenneth A. Kobak; Raymond W. Lam; Glenda Macqueen; Roumen Milev; Daniel J. Müller; Sagar V. Parikh; Franca M. Placenza; Sakina J. Rizvi; Susan Rotzinger; David V. Sheehan; Terrence Sills; Claudio N. Soares; Gustavo Turecki; Rudolph Uher; Janet B.W. Williams; Sidney H. Kennedy; Kenneth R. Evans

The depression inventory development scale: Assessment of psychometric properties using classical and modern measurement theory in a can-bind trial

Anthony L. Vaccarino, Amir H. Kalali, Pierre Blier, Susan Gilbert Evans, Nina Engelhardt, Jane A. Foster, Benicio N. Frey, John H. Greist, Kenneth A. Kobak, Raymond W. Lam, Glenda Macqueen, Roumen Milev, Daniel J. Müller, Sagar V. Parikh, Franca M. Placenza, Sakina J. Rizvi, Susan Rotzinger, David V. Sheehan, Terrence Sills, Claudio N. SoaresGustavo Turecki, Rudolph Uher, Janet B.W. Williams, Sidney H. Kennedy, Kenneth R. Evans

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Objective: The goal of the Depression Inventory Development (DID) project is to develop a comprehensive and psychometrically sound rating scale for major depressive disorder (MDD) that reflects current diagnostic criteria and conceptualizations of depression. We report here the evaluation of the current DID item bank using Classical Test Theory (CTT), Item Response Theory (IRT) and Rasch Measurement Theory (RMT). Methods: The present study was part of a larger multisite, open-label study conducted by the Canadian Biomarker Integration Network in Depression (ClinicalTrials.gov: NCT01655706). Trained raters administered the 32 DID items, each of two visits (MDD: Baseline, n=211 and Week 8, n=177; healthy participants: Baseline, n=112 and Week 8, n=104). The DID’s “grid” structure operationalizes intensity and frequency of each item, with clear symptom definitions and a structured interview guide, with the current iteration assessing symptoms related to anhedonia, cognition, fatigue, general malaise, motivation, anxiety, negative thinking, pain, and appetite. Participants were also administered the Montgomery– Åsberg Depression Rating Scale (MADRS) and Quick Inventory of Depressive Symptomatology-Self-Report (QIDS-SR) that allowed DID items to be evaluated against existing “benchmark” items. CTT was used to assess data quality/reliability (i.e., Dr., missing data, skewness, scoring frequency, internal consistency), IRT to assess individual item performance by modelling an item’s ability to discriminate levels of depressive severity (as assessed by the MADRS), and RMT to assess how the items perform together as a scale to capture a range of depressive severity (item targeting). These analyses together provided empirical evidence to base decisions on which DID items to remove, modify, or advance. Results: Of the 32 DID items evaluated, eight items were identified by CTT as problematic, displaying low variability in the range of responses, floor effects, and/or skewness; and four items were identified by IRT to show poor discriminative properties that would limit their clinical utility. Five additional items were deemed to be redundant. The remaining 15 DID items all fit the Rasch model, with person and item difficulty estimates indicating satisfactory item targeting, with lower precision in participants with mild levels of depression. These 15 DID items also showed good internal consistency (alpha=0.95 and inter-item correlations ranging from r=0.49 to r=0.84) and all items were sensitive to change following antidepressant treatment (baseline vs. Week 8). RMT revealed problematic item targeting for the MADRS and QIDSSR, including an absence of MADRS items targeting participants with mild/moderate depression and an absence of QIDS-SR items targeting participants with mild or severe depression. Conclusion: The present study applied CTT, IRT, and RMT to assess the measurement properties of the DID items and identify those that should be advanced, modified, or removed. Of the 32 items evaluated, 15 items showed good measurement properties. These items (along with previously evaluated items) will provide the basis for validation of a penultimate DID scale assessing anhedonia, cognitive slowing, concentration, executive function, recent memory, drive, emotional fatigue, guilt, self-esteem, hopelessness, tension, rumination, irritability, reduced appetite, insomnia, sadness, worry, suicidality, and depressed mood. The strategies adopted by the DID process provide a framework for rating scale development and validation.

Original language	English (US)
Pages (from-to)	30-40
Number of pages	11
Journal	Innovations in Clinical Neuroscience
Volume	17
Issue number	7-9
State	Published - Jul 1 2020
Externally published	Yes

Keywords

Classical Test Theory
Depressive symptoms
Item Response Theory
Major depressive disorder
Rasch Measurement Theory
Rating scales

ASJC Scopus subject areas

Clinical Neurology
Psychiatry and Mental health

Cite this

Vaccarino, A. L., Kalali, A. H., Blier, P., Evans, S. G., Engelhardt, N., Foster, J. A., Frey, B. N., Greist, J. H., Kobak, K. A., Lam, R. W., Macqueen, G., Milev, R., Müller, D. J., Parikh, S. V., Placenza, F. M., Rizvi, S. J., Rotzinger, S., Sheehan, D. V., Sills, T., ... Evans, K. R. (2020). The depression inventory development scale: Assessment of psychometric properties using classical and modern measurement theory in a can-bind trial. Innovations in Clinical Neuroscience, 17(7-9), 30-40.

Vaccarino, AL, Kalali, AH, Blier, P, Evans, SG, Engelhardt, N, Foster, JA, Frey, BN, Greist, JH, Kobak, KA, Lam, RW, Macqueen, G, Milev, R, Müller, DJ, Parikh, SV, Placenza, FM, Rizvi, SJ, Rotzinger, S, Sheehan, DV, Sills, T, Soares, CN, Turecki, G, Uher, R, Williams, JBW, Kennedy, SH & Evans, KR 2020, 'The depression inventory development scale: Assessment of psychometric properties using classical and modern measurement theory in a can-bind trial', Innovations in Clinical Neuroscience, vol. 17, no. 7-9, pp. 30-40.

@article{3b59253791fc4812b1a6e8d70fe74319,

title = "The depression inventory development scale: Assessment of psychometric properties using classical and modern measurement theory in a can-bind trial",

abstract = "Objective: The goal of the Depression Inventory Development (DID) project is to develop a comprehensive and psychometrically sound rating scale for major depressive disorder (MDD) that reflects current diagnostic criteria and conceptualizations of depression. We report here the evaluation of the current DID item bank using Classical Test Theory (CTT), Item Response Theory (IRT) and Rasch Measurement Theory (RMT). Methods: The present study was part of a larger multisite, open-label study conducted by the Canadian Biomarker Integration Network in Depression (ClinicalTrials.gov: NCT01655706). Trained raters administered the 32 DID items, each of two visits (MDD: Baseline, n=211 and Week 8, n=177; healthy participants: Baseline, n=112 and Week 8, n=104). The DID{\textquoteright}s “grid” structure operationalizes intensity and frequency of each item, with clear symptom definitions and a structured interview guide, with the current iteration assessing symptoms related to anhedonia, cognition, fatigue, general malaise, motivation, anxiety, negative thinking, pain, and appetite. Participants were also administered the Montgomery– {\AA}sberg Depression Rating Scale (MADRS) and Quick Inventory of Depressive Symptomatology-Self-Report (QIDS-SR) that allowed DID items to be evaluated against existing “benchmark” items. CTT was used to assess data quality/reliability (i.e., Dr., missing data, skewness, scoring frequency, internal consistency), IRT to assess individual item performance by modelling an item{\textquoteright}s ability to discriminate levels of depressive severity (as assessed by the MADRS), and RMT to assess how the items perform together as a scale to capture a range of depressive severity (item targeting). These analyses together provided empirical evidence to base decisions on which DID items to remove, modify, or advance. Results: Of the 32 DID items evaluated, eight items were identified by CTT as problematic, displaying low variability in the range of responses, floor effects, and/or skewness; and four items were identified by IRT to show poor discriminative properties that would limit their clinical utility. Five additional items were deemed to be redundant. The remaining 15 DID items all fit the Rasch model, with person and item difficulty estimates indicating satisfactory item targeting, with lower precision in participants with mild levels of depression. These 15 DID items also showed good internal consistency (alpha=0.95 and inter-item correlations ranging from r=0.49 to r=0.84) and all items were sensitive to change following antidepressant treatment (baseline vs. Week 8). RMT revealed problematic item targeting for the MADRS and QIDSSR, including an absence of MADRS items targeting participants with mild/moderate depression and an absence of QIDS-SR items targeting participants with mild or severe depression. Conclusion: The present study applied CTT, IRT, and RMT to assess the measurement properties of the DID items and identify those that should be advanced, modified, or removed. Of the 32 items evaluated, 15 items showed good measurement properties. These items (along with previously evaluated items) will provide the basis for validation of a penultimate DID scale assessing anhedonia, cognitive slowing, concentration, executive function, recent memory, drive, emotional fatigue, guilt, self-esteem, hopelessness, tension, rumination, irritability, reduced appetite, insomnia, sadness, worry, suicidality, and depressed mood. The strategies adopted by the DID process provide a framework for rating scale development and validation.",

keywords = "Classical Test Theory, Depressive symptoms, Item Response Theory, Major depressive disorder, Rasch Measurement Theory, Rating scales",

author = "Vaccarino, {Anthony L.} and Kalali, {Amir H.} and Pierre Blier and Evans, {Susan Gilbert} and Nina Engelhardt and Foster, {Jane A.} and Frey, {Benicio N.} and Greist, {John H.} and Kobak, {Kenneth A.} and Lam, {Raymond W.} and Glenda Macqueen and Roumen Milev and M{\"u}ller, {Daniel J.} and Parikh, {Sagar V.} and Placenza, {Franca M.} and Rizvi, {Sakina J.} and Susan Rotzinger and Sheehan, {David V.} and Terrence Sills and Soares, {Claudio N.} and Gustavo Turecki and Rudolph Uher and Williams, {Janet B.W.} and Kennedy, {Sidney H.} and Evans, {Kenneth R.}",

year = "2020",

month = jul,

day = "1",

language = "English (US)",

volume = "17",

pages = "30--40",

journal = "Innovations in Clinical Neuroscience",

issn = "2158-8333",

publisher = "Matrix Medical Communications",

number = "7-9",

}

TY - JOUR

T1 - The depression inventory development scale

T2 - Assessment of psychometric properties using classical and modern measurement theory in a can-bind trial

AU - Vaccarino, Anthony L.

AU - Kalali, Amir H.

AU - Blier, Pierre

AU - Evans, Susan Gilbert

AU - Engelhardt, Nina

AU - Foster, Jane A.

AU - Frey, Benicio N.

AU - Greist, John H.

AU - Kobak, Kenneth A.

AU - Lam, Raymond W.

AU - Macqueen, Glenda

AU - Milev, Roumen

AU - Müller, Daniel J.

AU - Parikh, Sagar V.

AU - Placenza, Franca M.

AU - Rizvi, Sakina J.

AU - Rotzinger, Susan

AU - Sheehan, David V.

AU - Sills, Terrence

AU - Soares, Claudio N.

AU - Turecki, Gustavo

AU - Uher, Rudolph

AU - Williams, Janet B.W.

AU - Kennedy, Sidney H.

AU - Evans, Kenneth R.

PY - 2020/7/1

Y1 - 2020/7/1

N2 - Objective: The goal of the Depression Inventory Development (DID) project is to develop a comprehensive and psychometrically sound rating scale for major depressive disorder (MDD) that reflects current diagnostic criteria and conceptualizations of depression. We report here the evaluation of the current DID item bank using Classical Test Theory (CTT), Item Response Theory (IRT) and Rasch Measurement Theory (RMT). Methods: The present study was part of a larger multisite, open-label study conducted by the Canadian Biomarker Integration Network in Depression (ClinicalTrials.gov: NCT01655706). Trained raters administered the 32 DID items, each of two visits (MDD: Baseline, n=211 and Week 8, n=177; healthy participants: Baseline, n=112 and Week 8, n=104). The DID’s “grid” structure operationalizes intensity and frequency of each item, with clear symptom definitions and a structured interview guide, with the current iteration assessing symptoms related to anhedonia, cognition, fatigue, general malaise, motivation, anxiety, negative thinking, pain, and appetite. Participants were also administered the Montgomery– Åsberg Depression Rating Scale (MADRS) and Quick Inventory of Depressive Symptomatology-Self-Report (QIDS-SR) that allowed DID items to be evaluated against existing “benchmark” items. CTT was used to assess data quality/reliability (i.e., Dr., missing data, skewness, scoring frequency, internal consistency), IRT to assess individual item performance by modelling an item’s ability to discriminate levels of depressive severity (as assessed by the MADRS), and RMT to assess how the items perform together as a scale to capture a range of depressive severity (item targeting). These analyses together provided empirical evidence to base decisions on which DID items to remove, modify, or advance. Results: Of the 32 DID items evaluated, eight items were identified by CTT as problematic, displaying low variability in the range of responses, floor effects, and/or skewness; and four items were identified by IRT to show poor discriminative properties that would limit their clinical utility. Five additional items were deemed to be redundant. The remaining 15 DID items all fit the Rasch model, with person and item difficulty estimates indicating satisfactory item targeting, with lower precision in participants with mild levels of depression. These 15 DID items also showed good internal consistency (alpha=0.95 and inter-item correlations ranging from r=0.49 to r=0.84) and all items were sensitive to change following antidepressant treatment (baseline vs. Week 8). RMT revealed problematic item targeting for the MADRS and QIDSSR, including an absence of MADRS items targeting participants with mild/moderate depression and an absence of QIDS-SR items targeting participants with mild or severe depression. Conclusion: The present study applied CTT, IRT, and RMT to assess the measurement properties of the DID items and identify those that should be advanced, modified, or removed. Of the 32 items evaluated, 15 items showed good measurement properties. These items (along with previously evaluated items) will provide the basis for validation of a penultimate DID scale assessing anhedonia, cognitive slowing, concentration, executive function, recent memory, drive, emotional fatigue, guilt, self-esteem, hopelessness, tension, rumination, irritability, reduced appetite, insomnia, sadness, worry, suicidality, and depressed mood. The strategies adopted by the DID process provide a framework for rating scale development and validation.

AB - Objective: The goal of the Depression Inventory Development (DID) project is to develop a comprehensive and psychometrically sound rating scale for major depressive disorder (MDD) that reflects current diagnostic criteria and conceptualizations of depression. We report here the evaluation of the current DID item bank using Classical Test Theory (CTT), Item Response Theory (IRT) and Rasch Measurement Theory (RMT). Methods: The present study was part of a larger multisite, open-label study conducted by the Canadian Biomarker Integration Network in Depression (ClinicalTrials.gov: NCT01655706). Trained raters administered the 32 DID items, each of two visits (MDD: Baseline, n=211 and Week 8, n=177; healthy participants: Baseline, n=112 and Week 8, n=104). The DID’s “grid” structure operationalizes intensity and frequency of each item, with clear symptom definitions and a structured interview guide, with the current iteration assessing symptoms related to anhedonia, cognition, fatigue, general malaise, motivation, anxiety, negative thinking, pain, and appetite. Participants were also administered the Montgomery– Åsberg Depression Rating Scale (MADRS) and Quick Inventory of Depressive Symptomatology-Self-Report (QIDS-SR) that allowed DID items to be evaluated against existing “benchmark” items. CTT was used to assess data quality/reliability (i.e., Dr., missing data, skewness, scoring frequency, internal consistency), IRT to assess individual item performance by modelling an item’s ability to discriminate levels of depressive severity (as assessed by the MADRS), and RMT to assess how the items perform together as a scale to capture a range of depressive severity (item targeting). These analyses together provided empirical evidence to base decisions on which DID items to remove, modify, or advance. Results: Of the 32 DID items evaluated, eight items were identified by CTT as problematic, displaying low variability in the range of responses, floor effects, and/or skewness; and four items were identified by IRT to show poor discriminative properties that would limit their clinical utility. Five additional items were deemed to be redundant. The remaining 15 DID items all fit the Rasch model, with person and item difficulty estimates indicating satisfactory item targeting, with lower precision in participants with mild levels of depression. These 15 DID items also showed good internal consistency (alpha=0.95 and inter-item correlations ranging from r=0.49 to r=0.84) and all items were sensitive to change following antidepressant treatment (baseline vs. Week 8). RMT revealed problematic item targeting for the MADRS and QIDSSR, including an absence of MADRS items targeting participants with mild/moderate depression and an absence of QIDS-SR items targeting participants with mild or severe depression. Conclusion: The present study applied CTT, IRT, and RMT to assess the measurement properties of the DID items and identify those that should be advanced, modified, or removed. Of the 32 items evaluated, 15 items showed good measurement properties. These items (along with previously evaluated items) will provide the basis for validation of a penultimate DID scale assessing anhedonia, cognitive slowing, concentration, executive function, recent memory, drive, emotional fatigue, guilt, self-esteem, hopelessness, tension, rumination, irritability, reduced appetite, insomnia, sadness, worry, suicidality, and depressed mood. The strategies adopted by the DID process provide a framework for rating scale development and validation.

KW - Classical Test Theory

KW - Depressive symptoms

KW - Item Response Theory

KW - Major depressive disorder

KW - Rasch Measurement Theory

KW - Rating scales

UR - http://www.scopus.com/inward/record.url?scp=85100478153&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85100478153&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85100478153

SN - 2158-8333

VL - 17

SP - 30

EP - 40

JO - Innovations in Clinical Neuroscience

JF - Innovations in Clinical Neuroscience

IS - 7-9

ER -

The depression inventory development scale: Assessment of psychometric properties using classical and modern measurement theory in a can-bind trial

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this