Trustworthy assertion classification through prompting

Song Wang; Liyan Tang; Akash Majety; Justin F. Rousseau; George Shih; Ying Ding; Yifan Peng

doi:10.1016/j.jbi.2022.104139

Trustworthy assertion classification through prompting

Song Wang, Liyan Tang, Akash Majety, Justin F. Rousseau, George Shih, Ying Ding, Yifan Peng

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Accurate identification of the presence, absence or possibility of relevant entities in clinical notes is important for healthcare professionals to quickly understand crucial clinical information. This introduces the task of assertion classification - to correctly identify the assertion status of an entity in the unstructured clinical notes. Recent rule-based and machine-learning approaches suffer from labor-intensive pattern engineering and severe class bias toward majority classes. To solve this problem, in this study, we propose a prompt-based learning approach, which treats the assertion classification task as a masked language auto-completion problem. We evaluated the model on six datasets. Our prompt-based method achieved a micro-averaged F-1 of 0.954 on the i2b2 2010 assertion dataset, with ∼1.8% improvements over previous works. In particular, our model showed excellence in detecting classes with few instances (few-shot). Evaluations on five external datasets showcase the outstanding generalizability of the prompt-based method to unseen data. To examine the rationality of our model, we further introduced two rationale faithfulness metrics: comprehensiveness and sufficiency. The results reveal that compared to the “pre-train, fine-tune” procedure, our prompt-based model has a stronger capability of identifying the comprehensive (∼63.93%) and sufficient (∼11.75%) linguistic features from free text. We further evaluated the model-agnostic explanations using LIME. The results imply a better rationale agreement between our model and human beings (∼71.93% in average F-1), which demonstrates the superior trustworthiness of our model.

Original language	English (US)
Article number	104139
Journal	Journal of Biomedical Informatics
Volume	132
DOIs	https://doi.org/10.1016/j.jbi.2022.104139
State	Published - Aug 2022
Externally published	Yes

Keywords

Concept assertion
Deep learning
NLP
Prompt-based learning

ASJC Scopus subject areas

Health Informatics
Computer Science Applications

Access to Document

10.1016/j.jbi.2022.104139

Cite this

@article{4f29231ae805406588441883580f2fd3,

title = "Trustworthy assertion classification through prompting",

abstract = "Accurate identification of the presence, absence or possibility of relevant entities in clinical notes is important for healthcare professionals to quickly understand crucial clinical information. This introduces the task of assertion classification - to correctly identify the assertion status of an entity in the unstructured clinical notes. Recent rule-based and machine-learning approaches suffer from labor-intensive pattern engineering and severe class bias toward majority classes. To solve this problem, in this study, we propose a prompt-based learning approach, which treats the assertion classification task as a masked language auto-completion problem. We evaluated the model on six datasets. Our prompt-based method achieved a micro-averaged F-1 of 0.954 on the i2b2 2010 assertion dataset, with ∼1.8% improvements over previous works. In particular, our model showed excellence in detecting classes with few instances (few-shot). Evaluations on five external datasets showcase the outstanding generalizability of the prompt-based method to unseen data. To examine the rationality of our model, we further introduced two rationale faithfulness metrics: comprehensiveness and sufficiency. The results reveal that compared to the “pre-train, fine-tune” procedure, our prompt-based model has a stronger capability of identifying the comprehensive (∼63.93%) and sufficient (∼11.75%) linguistic features from free text. We further evaluated the model-agnostic explanations using LIME. The results imply a better rationale agreement between our model and human beings (∼71.93% in average F-1), which demonstrates the superior trustworthiness of our model.",

keywords = "Concept assertion, Deep learning, NLP, Prompt-based learning",

author = "Song Wang and Liyan Tang and Akash Majety and Rousseau, {Justin F.} and George Shih and Ying Ding and Yifan Peng",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Inc.",

year = "2022",

month = aug,

doi = "10.1016/j.jbi.2022.104139",

language = "English (US)",

volume = "132",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Trustworthy assertion classification through prompting

AU - Wang, Song

AU - Tang, Liyan

AU - Majety, Akash

AU - Rousseau, Justin F.

AU - Shih, George

AU - Ding, Ying

AU - Peng, Yifan

PY - 2022/8

Y1 - 2022/8

N2 - Accurate identification of the presence, absence or possibility of relevant entities in clinical notes is important for healthcare professionals to quickly understand crucial clinical information. This introduces the task of assertion classification - to correctly identify the assertion status of an entity in the unstructured clinical notes. Recent rule-based and machine-learning approaches suffer from labor-intensive pattern engineering and severe class bias toward majority classes. To solve this problem, in this study, we propose a prompt-based learning approach, which treats the assertion classification task as a masked language auto-completion problem. We evaluated the model on six datasets. Our prompt-based method achieved a micro-averaged F-1 of 0.954 on the i2b2 2010 assertion dataset, with ∼1.8% improvements over previous works. In particular, our model showed excellence in detecting classes with few instances (few-shot). Evaluations on five external datasets showcase the outstanding generalizability of the prompt-based method to unseen data. To examine the rationality of our model, we further introduced two rationale faithfulness metrics: comprehensiveness and sufficiency. The results reveal that compared to the “pre-train, fine-tune” procedure, our prompt-based model has a stronger capability of identifying the comprehensive (∼63.93%) and sufficient (∼11.75%) linguistic features from free text. We further evaluated the model-agnostic explanations using LIME. The results imply a better rationale agreement between our model and human beings (∼71.93% in average F-1), which demonstrates the superior trustworthiness of our model.

AB - Accurate identification of the presence, absence or possibility of relevant entities in clinical notes is important for healthcare professionals to quickly understand crucial clinical information. This introduces the task of assertion classification - to correctly identify the assertion status of an entity in the unstructured clinical notes. Recent rule-based and machine-learning approaches suffer from labor-intensive pattern engineering and severe class bias toward majority classes. To solve this problem, in this study, we propose a prompt-based learning approach, which treats the assertion classification task as a masked language auto-completion problem. We evaluated the model on six datasets. Our prompt-based method achieved a micro-averaged F-1 of 0.954 on the i2b2 2010 assertion dataset, with ∼1.8% improvements over previous works. In particular, our model showed excellence in detecting classes with few instances (few-shot). Evaluations on five external datasets showcase the outstanding generalizability of the prompt-based method to unseen data. To examine the rationality of our model, we further introduced two rationale faithfulness metrics: comprehensiveness and sufficiency. The results reveal that compared to the “pre-train, fine-tune” procedure, our prompt-based model has a stronger capability of identifying the comprehensive (∼63.93%) and sufficient (∼11.75%) linguistic features from free text. We further evaluated the model-agnostic explanations using LIME. The results imply a better rationale agreement between our model and human beings (∼71.93% in average F-1), which demonstrates the superior trustworthiness of our model.

KW - Concept assertion

KW - Deep learning

KW - NLP

KW - Prompt-based learning

UR - http://www.scopus.com/inward/record.url?scp=85134307403&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85134307403&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2022.104139

DO - 10.1016/j.jbi.2022.104139

M3 - Article

C2 - 35811026

AN - SCOPUS:85134307403

SN - 1532-0464

VL - 132

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

M1 - 104139

ER -

Trustworthy assertion classification through prompting

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this