The German medical language model-driven approach, in contrast, did not outperform the baseline, achieving an F1 score no greater than 0.42.
A significant publicly funded initiative, intended to build a German-language medical text corpus, is scheduled to begin in the middle of 2023. The clinical texts within the information systems of six university hospitals, comprising GeMTeX, will be made accessible for NLP purposes by annotating entities and relations, with added meta-information. The presence of a strong governance model results in a dependable legal framework for employing the corpus. State-of-the-art natural language processing methods are applied to construct, pre-annotate, and annotate the corpus, resulting in the training of language models. A community will be developed around GeMTeX, aimed at ensuring its continued upkeep, practicality, and dissemination.
Health information is obtained through a search process that involves exploring multiple sources of health-related data. The process of gathering self-reported health information can potentially increase our understanding of the symptoms and characteristics of various diseases. Our investigation into symptom mentions from COVID-19-related Twitter posts leveraged a pre-trained large language model (GPT-3), conducting zero-shot learning without the use of any example data. Introducing a new performance measure, Total Match (TM), which accounts for exact, partial, and semantic matches. Our study's outcomes highlight the zero-shot technique's strength, independent of data annotation, and its capacity to support the generation of instances for few-shot learning, which could deliver superior outcomes.
Unstructured free text in medical documents can be processed for information extraction using language models like BERT. Large datasets are used to initially pre-train these models in understanding language patterns and particular domains; their performance is then fine-tuned with labeled data to address particular tasks. To develop annotated Estonian healthcare information, we suggest a pipeline incorporating human-in-the-loop labeling. The ease of use of this method is particularly evident for medical professionals working with low-resource languages, making it a superior alternative to rule-based techniques such as regular expressions.
The history of health data storage, dating back to Hippocrates, favors written materials, and the medical narrative is fundamental to creating a personalized patient-doctor interaction. Let us not deny natural language its status as a user-approved technology, one that has withstood the trials of time. To capture semantic data at the point of care, we have previously used a controlled natural language as an interface for human-computer interaction. Guided by a linguistic interpretation of the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) conceptual model, our computable language came to be. We propose an extension in this paper that permits the recording of measurement results, complete with numerical values and their accompanying units. We assess the interplay between our methodology and the development of clinical information modeling.
Using a semi-structured clinical problem list, containing 19 million de-identified entries cross-referenced with ICD-10 codes, closely related real-world expressions were identified. The generation of an embedding representation, using SapBERT, supported the integration of seed terms, stemming from a log-likelihood-based co-occurrence analysis, into a k-NN search.
Vector representations, otherwise known as word embeddings, are commonly used techniques in natural language processing. Contextualized representations have particularly distinguished themselves through their recent successes. This research investigates the consequences of using contextualized and non-contextual embeddings for medical concept normalization, using a k-NN approach to align clinical terms with the SNOMED CT ontology. Compared to the contextualized representation (F1-score = 0.322), the non-contextualized concept mapping demonstrated markedly improved performance, achieving an F1-score of 0.853.
This paper marks a pioneering attempt at mapping UMLS concepts to pictographs, envisioned as a supportive resource within medical translation systems. Reviewing pictographs from two publicly accessible sources exposed a significant gap in representation for numerous concepts, signifying that word-based search is insufficient for this kind of task.
Predicting meaningful outcomes in patients affected by complex medical conditions using multiple sources of electronic medical record information represents a noteworthy challenge. Super-TDU mw Employing EMR data encompassing Japanese clinical records, rich in contextual nuance, we developed a machine learning model to anticipate the hospital course of cancer patients. The mortality prediction model's high accuracy, derived from clinical text analysis in conjunction with other clinical data, suggests its applicability for cancer-related predictions.
To classify German cardiologist's correspondence, dividing sentences into eleven subject areas, we implemented pattern-discovery training. This prompt-driven method for text classification in limited datasets (20, 50, and 100 instances per class) used language models pre-trained with various strategies. Evaluated on the CARDIODE open-source German clinical text collection. In clinical applications, prompting leads to a 5-28% increase in accuracy compared to conventional approaches, thereby decreasing manual annotation and computational burdens.
Despite its presence, depression in cancer patients is frequently left unacknowledged and thus untreated. We constructed a prediction model, leveraging machine learning and natural language processing (NLP), to determine depression risk within one month of commencing cancer treatment. Structured data-driven LASSO logistic regression model exhibited strong performance, in contrast to the clinician-note-dependent NLP model, which demonstrated poor performance. antibiotic selection Validated depression risk prediction models could facilitate earlier identification and intervention for vulnerable individuals, improving cancer care and ultimately enhancing patient adherence to treatments.
Determining diagnostic classifications within the emergency room (ER) environment is a complex procedure. Through the application of natural language processing, we developed a range of classification models, investigating both the full spectrum of 132 diagnostic categories and multiple clinical examples featuring two hard-to-distinguish diagnoses.
This study contrasts the use of a speech-enabled phraselator (BabelDr) against telephone interpreting, in facilitating communication with allophone patients. To analyze the satisfaction derived from these media and explore their corresponding benefits and drawbacks, we performed a crossover study encompassing both medical professionals and standardized patients, who completed patient histories and surveys. The data we gathered suggests superior overall satisfaction with telephone interpretation, yet both modes of communication hold value. Due to this, we argue for the integration of BabelDr and telephone interpreting, leading to a more robust approach.
Concepts in medical literature are often named after individuals, a common practice. Exercise oncology The use of natural language processing (NLP) tools to automatically identify such eponyms is, however, made difficult by the prevalence of spelling ambiguities and varied interpretations. Word vectors and transformer models, recently developed methods, weave contextual information into the downstream layers of a neural network's architecture. To evaluate these models for medical eponym classification, we use a dataset of 1079 PubMed abstracts, labeling examples and counter-examples, and train logistic regression models on feature vectors from the initial (vocabulary) and concluding (contextual) layers of the SciBERT language model. The sensitivity-specificity curves show that models based on contextualized vectors achieved a median of 980% performance on phrases held out from training. This model's performance outstripped vocabulary-vector-based models, with a median enhancement of 23 percentage points and a 957% improvement. Unlabeled input processing seemed to allow these classifiers to adapt to eponyms absent from any annotations. The findings strongly support the benefits of developing domain-specific NLP functions, leveraging pre-trained language models, and accentuate the indispensable nature of contextual information for classifying potential eponyms.
Chronic heart failure, a prevalent ailment, frequently leads to high rates of re-hospitalization and mortality. Data collected through HerzMobil's telemedicine-assisted transitional care disease management program are structured, including daily vital parameter measurements and other heart failure-specific data points. In addition, the healthcare team members utilize the system for communication, recording their clinical observations in free-text format. Because manually annotating these notes is unduly time-consuming in routine care settings, an automated analysis method is required. In the current study, a gold standard classification of 636 randomly selected clinical records from HerzMobil was determined by the annotations of 9 experts with varying professional backgrounds (2 physicians, 4 nurses, and 3 engineers). We analyzed how differing professional experiences shaped inter-annotator reliability, measuring these results against the accuracy of an automatic classification approach. Depending on the profession and the category, considerable variations were ascertained. In view of these findings, it is important to recognize the significance of a variety of professional backgrounds when selecting annotators for scenarios like this.
Vaccination efforts, a cornerstone of public health, are facing challenges due to vaccine hesitancy and skepticism, a concern amplified in countries like Sweden. This study leverages Swedish social media data and structural topic modeling to uncover discussion themes surrounding mRNA vaccines and to better understand how individuals' acceptance or rejection of this technology affects vaccine adoption.