Tentamen (uitwerkingen)

Automatic Detection of Answers to Research Questions from Medline Abstracts

Beoordeling

Verkocht

Pagina's

Cijfer

A+

Geüpload op

10-08-2024

Geschreven in

2024/2025

1 Introduction The large amount of medical literature hinders professionals from analyzing all the relevant knowledge to particular medical questions. Search engines are increasingly used to access such information. However, such systems retrieve documents based on the appearance of the query terms in the text despite the fact that they may describe another problem. The search engine Pubmed R for example is a well known IR system to access more than 24 million abstracts for the biomedical literature including Medline R (Wheeler et al., 2008). The engine takes a query from user and returns a list of abstracts that can be relevant or partially irrelevant to the query, which requires from the user to go through each abstract for further analysis and evaluation. Researchers who conduct a systematic review (Gough et al., 2012) tend to use the same approach to collect the studies of interest; however, they are found to spend significant effort identifying the studies that are relevant to the research question. Relevancy is usually measured by scanning the result and conclusion sections to identify authors claim and then comparing the claim with the review question; where a claim can be defined as the summary of the main points presented in a research argument. Incorporating a middle tier system between the search engine and the user will be useful to minimize the effort required to filter the results. This research presents a system that aids those searching for studies that discuss a particular research question. The system acts as a mediator between the search engine and the user. It interprets the search engine results and returns the most informative sentence(s) from the claim zone of each abstract that are potential answers to the research question. The system reduces the cognitive loads on the user by assisting their identification of relevant claims within abstracts The system comprises two components. The first component identifies the claim zone in each abstract using the rhetorical moves principle (Teufel and Moens, 2002), and the second component uses the sentences in the claim zone to predict the most informative sentence(s) from each abstract to the given query. This paper makes three contributions: presenting a new set of features to build a classifier to identify the structure role of sentences in an abstract that is at least shows similar performance to the current systems; building a classifier to detect the best sentence(s) (lexically) that can be an answer to a given query; and introducing a new feature (Z-score) for this task. 2 Related Work We are not aware of any work that has explicitly discussed the detection of claim sentence most related to a predefined question, however, studies have discussed related research. Ruch et al. (2007) for example used the rhetorical moves approach to identify the conclusion sentences in abstracts. Their system was based on a Bayesian classifier, and normalized n-grams and relative position features. The main objective of that research was to identify sentences that belong to the conclusion sections of abstracts; they re141 garded such information as key information to determine the research topic. Our research is similar to that work since we use the conclusion section to identify the key information in an abstract with respect to a query, but we also include the result sections. Hirohata et al. (2008) showed a similar system using CRFs to classify the abstract sentences into four categories: objective, methods, results, and conclusions. That classifier takes into account the neighbouring features in sentence Sn such as the n-grams of the previous sentence Sn−1 and the next sentence Sn+1. Agarwal et al. (2009) described a system that automatically classifies sentences appear in full biomedical articles into one of four rhetorical categories: introduction, methods, results and discussions. The best system

Meer zien Lees minder

Instelling

Automatic Detection

Vak

Automatic Detection

Voorbeeld van de inhoud

Automatic Detection of Answers to Research Questions from Medline
Abstracts
Abdulaziz Alamri and Mark Stevenson
Department of Computer Science
The University of Sheffield
Sheffield, UK
;

Abstract Incorporating a middle tier system between the
search engine and the user will be useful to min-
Given a set of abstracts retrieved from a imize the effort required to filter the results. This
search engine such as Pubmed, we aim to research presents a system that aids those search-
automatically identify the claim zone in ing for studies that discuss a particular research
each abstract and then select the best sen- question. The system acts as a mediator between
tence(s) from that zone that can serve as the search engine and the user. It interprets the
an answer to a given query. The system search engine results and returns the most infor-
can provide a fast access mechanism to the mative sentence(s) from the claim zone of each
most informative sentence(s) in abstracts abstract that are potential answers to the research
with respect to the given query. question. The system reduces the cognitive loads
on the user by assisting their identification of rele-
1 Introduction vant claims within abstracts
The large amount of medical literature hinders The system comprises two components. The
professionals from analyzing all the relevant first component identifies the claim zone in
knowledge to particular medical questions. Search each abstract using the rhetorical moves principle
engines are increasingly used to access such in- (Teufel and Moens, 2002), and the second compo-
formation. However, such systems retrieve docu- nent uses the sentences in the claim zone to pre-
ments based on the appearance of the query terms dict the most informative sentence(s) from each
in the text despite the fact that they may describe abstract to the given query.
another problem. This paper makes three contributions: present-
The search engine Pubmed R for example is a ing a new set of features to build a classifier to
well known IR system to access more than 24 mil- identify the structure role of sentences in an ab-
lion abstracts for the biomedical literature includ- stract that is at least shows similar performance to
ing Medline R (Wheeler et al., 2008). The engine the current systems; building a classifier to detect
takes a query from user and returns a list of ab- the best sentence(s) (lexically) that can be an an-
stracts that can be relevant or partially irrelevant swer to a given query; and introducing a new fea-
to the query, which requires from the user to go ture (Z-score) for this task.
through each abstract for further analysis and eval-
2 Related Work
uation.
Researchers who conduct a systematic review We are not aware of any work that has explicitly
(Gough et al., 2012) tend to use the same approach discussed the detection of claim sentence most re-
to collect the studies of interest; however, they lated to a predefined question, however, studies
are found to spend significant effort identifying have discussed related research.
the studies that are relevant to the research ques- Ruch et al. (2007) for example used the rhetori-
tion. Relevancy is usually measured by scanning cal moves approach to identify the conclusion sen-
the result and conclusion sections to identify au- tences in abstracts. Their system was based on a
thors claim and then comparing the claim with the Bayesian classifier, and normalized n-grams and
review question; where a claim can be defined as relative position features. The main objective of
the summary of the main points presented in a re- that research was to identify sentences that belong
search argument. to the conclusion sections of abstracts; they re-

141
Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015), pages 141–146,
Beijing, China, July 30, 2015. c 2015 Association for Computational Linguistics

, garded such information as key information to de- National Library of Medicine (NLM) have re-
termine the research topic. Our research is similar ported that 2,779 headings have been used to label
to that work since we use the conclusion section abstracts sections in Medline (Ripple et al., 2012).
to identify the key information in an abstract with Relying on the labels provided by the abstracts
respect to a query, but we also include the result authors to identify the roles of the sentences could
sections. be useful for research purpose; but in practice
Hirohata et al. (2008) showed a similar sys- this means all Medline abstracts need to be re-
tem using CRFs to classify the abstract sentences annotated even the structured abstracts to guaran-
into four categories: objective, methods, results, tee that they are labelled with the same set of an-
and conclusions. That classifier takes into account notations to understand their roles. This is not ef-
the neighbouring features in sentence Sn such as ficient especially when we consider the huge vol-
the n-grams of the previous sentence Sn−1 and the ume of the Medline repository.
next sentence Sn+1 .
To accommodate that problem, we use the NLM
Agarwal et al. (2009) described a system that
category value assigned to each section in the
automatically classifies sentences appear in full
XML abstract (nlmCategory attribute). The NLM
biomedical articles into one of four rhetorical cat-
assigns five possible values (categories): Objec-
egories: introduction, methods, results and discus-
tive, Background, Methods, Results and Conclu-
sions. The best system was achieved using Multi-
sions. This research uses these categories as an
nominal Naive Bayes. They reported that their
alternative way to learn the roles of abstracts sen-
system outperformed their baseline system which
tences. This resolves two problems: first, the roles
was a rule-based.
of sentences in structured abstracts can be auto-
Recently, Yepes et al. (2013) described a system
matically learned from the the value of the nlm-
to index Gene Reference Into Function (GeneRIF)
Category attribute without any further processing,
sentences that show novel functionality of genes
consequently, the roles of sentences in 30% of
mentioned in Medline. The goal of that work
the Medline abstracts can be accurately identified;
was to choose the most likely sentences to be se-
second, those labels can be used to build a machine
lected for GeneRIF indexing. The best system was
learning classifier to predict the role sentences of
achieved using Naive Bayes classifier and various
the unstructured abstracts in Medline.
features including the discourse annotations (the
NLM category labels) for the abstracts sentences. The claim zoning component regards identify-
Our research is close to Hirohata et al. (2008) ing the roles of sentences as a sequence labelling
system since we use the same algorithm, but use a problem. This requires an algorithm that takes
different set of features to build the model. More- into account the neighbouring observations rather
over, it similar to Yepes et al.(2013) system since than only current observation as in other ordinary
we use the value of the nlmCategory attribute classifiers e.g. SVM and Naive bayes. Condi-
rather than the labels provided by the authors to tional Random Fields (CRF) algorithm have been
learn the role of sentences. used successfully for such task (Hirohata et al.,
2008; Lin et al., 2009). Therefore, we use the
3 Method CRF algorithm along with lexical, structural and
sequential features to build a classifier model to
3.1 Claim Zoning Component identify the claim zones in abstracts. The clas-
This component is based on the hypothesis that the sifier is implemented using the CRFsuite library
contribution of a research paper tend to be found (Okazaki, 2007) using L-BFGS method. Note that
within the result or conclusion sections of its ab- we modify the NLM five categories to become
stract (Lin et al., 2009). Identifying these sections four where the Background and Objective cate-
manually especially in unstructured abstracts is a gories are merged into a new category called Intro-
tedious task. Medical abstracts tend to have logi- duction. That is because the background and ob-
cal structure (Orasan, 2001) in which each section jectives sections in Medline tend to overlap with
represent a different role. each other (Lin et al., 2009). Moreover, these
Unfortunately, about 70% of Medline abstracts sections usually appear sequentially and merging
are unstructured (have no section labels). Struc- them together is sensible to avoid the overlapping
tured abstracts use a variety of these labels. The problem. Therefore, this component identifies the

142

Meld schending auteursrecht

Geschreven voor

Instelling: Automatic Detection
Vak: Automatic Detection

Documentinformatie

Geüpload op: 10 augustus 2024
Aantal pagina's: 6
Geschreven in: 2024/2025
Type: Tentamen (uitwerkingen)
Bevat: Vragen en antwoorden

Onderwerpen

automatic detection of answers to research questio

$15.99

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

StudyCenter1

4.3

(28)

Maak kennis met de verkoper

StudyCenter1 Teachme2-tutor

Bekijk profiel

Volgen

Verkocht

227

Lid sinds

2 jaar

Aantal volgers

Documenten

3850

Laatst verkocht

1 week geleden

Nursing school is hard! Im here to simply the information and make it easier!

My mission is to be your LIGHT in the dark. If you"re worried or having trouble in nursing school, I really want my notes to be your guide! I know they have helped countless others get through and thats all i want for YOU! Stay with me and you will find everything you need to study and pass any tests,quizzes abd exams!

4.3

28 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper StudyCenter1. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $15.99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50860 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Automatic Detection of Answers to Research Questions from Medline Abstracts

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?