Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

Exam (elaborations) Abstract21083

Beoordeling
-
Verkocht
-
Pagina's
13
Cijfer
A+
Geüpload op
07-07-2024
Geschreven in
2023/2024

Knowledge Base Question Answering for Space Debris QueriesSpace agencies execute complex satellite operations that need to be supported by the technical knowledge contained in their extensive information systems. Knowledge bases (KB) are an effective way of storing and accessing such information at scale. In this work we present a system, developed for the European Space Agency (ESA), that can answer complex natural language queries, to support engineers in accessing the information contained in a KB that models the orbital space debris environment. Our system is based on a pipeline which first generates a sequence of basic database operations, called a sketch, from a natural language question, then specializes the sketch into a concrete query program with mentions of entities, attributes and relations, and finally executes the program against the database. This pipeline decomposition approach enables us to train the system by leveraging out-of-domain data and semi-synthetic data generated by GPT-3, thus reducing overfitting and shortcut learning even with limited amount of in-domain training data. Our code can be found at PaulDrm/DISCOSQA. 1 Introduction Space debris are uncontrolled artificial objects in space that are left in orbit during either normal operations or due to malfunctions. Collisions involving space debris can generate secondary debris which can cause more collisions, potentially leading to a runaway effect known as Kessler Syndrome (Kessler and Cour-Palais, 1978; Kessler et al., 2010), which in the worst-case scenario could make large ranges of orbits unusable for space operations for multiple generations. Therefore, space agencies have established departments responsible for cataloging the space debris environment, which can be used for space traffic management, collision avoidance, re-entry Question: What is the inclination of the orbit of Hubble? Sketch: Find Relate QueryAttr Arguments: Hubble Orbit inclination Question: How many rocket debris objects have re-entered Earth’s atmosphere before 2019? Sketch: FindAll Filter Year Filter Concept Count Arguments: Reentry, 2019, Rocket Debris Objects Figure 1: Two representative queries for DISCOS and their decomposition according to the Program Induction method. analysis, and raising public awareness of the problem.1 The European Space Agency (ESA) has catalogued over 40,000 trackable and unidentified objects in its DISCOS (Database and Information System Characterizing Objects in Space) Knowledge Base (KB) (Klinkrad, 1991; Flohrer et al., 2013). Accessing this information efficiently often requires technical expertise in query languages and familiarity with the specific schema of DISCOS, which may fall outside the skillset of the engineers searching for relevant information in the database. In this project, we developed a question answering system for the DISCOS KB. This deployed prototype enables ESA engineers to query the database with complex natural language (English) questions, improving their ability to make informed decisions regarding space debris. Recent breakthroughs in open question answering have been achieved using large language models that have been fine-tuned as dialog assistants, such as ChatGPT.2 These models, however, are 1 2 487 black boxes that store knowledge implicitly in their parameters which makes it hard to guarantee that their answers are supported by explicit evidence, understand their failures and update them when the supporting facts change. In contrast, parsing a question into a query program and then executing it on an explicit KB is guaranteed to provide a factual correct answer provided the KB and query program are correct. Our approach is particularly useful for applications such as satellite operations where accuracy and reliability are critical. The main challenge for this project was that no training set or example questions were available for the DISCOS KB. This issue, combined with the large amount of unique and diverse objects in the database, precluded a straightforward application of common supervised learning techniques. Although possible strategies for solving this task, such as direct semantic parsing of the query with seq2seq models, were identified in the literature, they suffer from problems with compositional generalization (Herzig et al., 2021; Furrer et al., 2020). Furthermore, very little work has been done on generalizing to KB element components that were never seen during training (Cao et al., 2022b; Das et al., 2021a; Huang et al., 2021). To overcome these challenges, we apply and adapt a methodology from the literature called Program Transfer (Cao et al., 2022b) to significantly reduce the required dataset for adequate generalization over the complete DISCOS KB. This is a two-step approach. For each user query first a program sketch is predicted, consisting of a sequence of query functions where the arguments are either variables or placeholders, then the representation of the query is compared to the representations of the KB entities, in order to fill out the placeholders with arguments relevant to the query text. The underlying query language of this approach is called Knowledge-orientedProgramming-Language (KoPL) for which two representative example questions are shown together with their decomposition into sketch and arguments in Figure 1. We also conduct a data collection study with domain experts, and we apply a data augmentation pipeline leveraging the underlying ontology of the KB and prompting a Large Language Model (LLM) to generate automatically more training examples. The architecture was retrained with different domain-specific LMs and baselines to determine the benefits of using a domain-specific pretrained encoder. The main contributions of this paper are: • Applying and adapting a methodology described in the literature for complex knowledge base question answering (CKBQA) on a novel industry-relevant database, with a large and dynamic set of unique entities; • Collecting a new dataset on this database from domain-experts and leveraging the in-context learning capability of LLMs for data augmentation on it; • Evaluating the use of domain-specific LMs as different encoders on our curated dataset; • Demonstrating the effectiveness of the approach by achieving comparable results to generalpurpose LLMs 2 Related Work Low-resource CKBQA Pre-trained language models have demonstrated state-of-the-art performance in semantic parsing for complex question answering on KBs where the same logic compounds are contained in both the training and validation sets (Furrer et al., 2020). However, they struggle with compositional generalization, where the “compounds” (combinations) of components are diverse between training and validation, even if all components (entity, relation, program filters) have been seen during training (Herzig et al., 2021). Das et al. (2021b) explored retrieval-based methods to pick the top n similar examples from the training set and use them as additional input for the prediction. In theory, this would make it possible to reason over changes on the KB by only adding new examples to the training set without the need of retraining the whole model. Another approach is adapting the architecture of language models to incorporate the structure of a KB directly for the prediction. For example, Huang et al. (2021) ranked FreeBase KB entities by using an EleasticSearch search API to identify these entities. When generating the query program, instead of entities a special token is predicted, which in the post-processing step get replaced by the top ranked entity identified by ElasticSearch. Although, achieving good results, it is unclear how this would translate to queries with multiple entities and also has the typical limitations of ElasticSearch. Another method is the Program Induction and Program Transfer method, where a sequence

Meer zien Lees minder
Instelling
Abstract21083
Vak
Abstract21083









Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Abstract21083
Vak
Abstract21083

Documentinformatie

Geüpload op
7 juli 2024
Aantal pagina's
13
Geschreven in
2023/2024
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$8.99
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
leonardkinyua

Maak kennis met de verkoper

Seller avatar
leonardkinyua Abraham Lincoln University, School Of Law
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
-
Lid sinds
2 jaar
Aantal volgers
0
Documenten
323
Laatst verkocht
-

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen