Exam (elaborations) Abstract21083
Knowledge Base Question Answering for Space Debris QueriesSpace agencies execute complex satellite operations that need to be supported by the technical knowledge contained in their extensive information systems. Knowledge bases (KB) are an effective way of storing and accessing such information at scale. In this work we present a system, developed for the European Space Agency (ESA), that can answer complex natural language queries, to support engineers in accessing the information contained in a KB that models the orbital space debris environment. Our system is based on a pipeline which first generates a sequence of basic database operations, called a sketch, from a natural language question, then specializes the sketch into a concrete query program with mentions of entities, attributes and relations, and finally executes the program against the database. This pipeline decomposition approach enables us to train the system by leveraging out-of-domain data and semi-synthetic data generated by GPT-3, thus reducing overfitting and shortcut learning even with limited amount of in-domain training data. Our code can be found at PaulDrm/DISCOSQA. 1 Introduction Space debris are uncontrolled artificial objects in space that are left in orbit during either normal operations or due to malfunctions. Collisions involving space debris can generate secondary debris which can cause more collisions, potentially leading to a runaway effect known as Kessler Syndrome (Kessler and Cour-Palais, 1978; Kessler et al., 2010), which in the worst-case scenario could make large ranges of orbits unusable for space operations for multiple generations. Therefore, space agencies have established departments responsible for cataloging the space debris environment, which can be used for space traffic management, collision avoidance, re-entry Question: What is the inclination of the orbit of Hubble? Sketch: Find Relate QueryAttr Arguments: Hubble Orbit inclination Question: How many rocket debris objects have re-entered Earth’s atmosphere before 2019? Sketch: FindAll Filter Year Filter Concept Count Arguments: Reentry, 2019, Rocket Debris Objects Figure 1: Two representative queries for DISCOS and their decomposition according to the Program Induction method. analysis, and raising public awareness of the problem.1 The European Space Agency (ESA) has catalogued over 40,000 trackable and unidentified objects in its DISCOS (Database and Information System Characterizing Objects in Space) Knowledge Base (KB) (Klinkrad, 1991; Flohrer et al., 2013). Accessing this information efficiently often requires technical expertise in query languages and familiarity with the specific schema of DISCOS, which may fall outside the skillset of the engineers searching for relevant information in the database. In this project, we developed a question answering system for the DISCOS KB. This deployed prototype enables ESA engineers to query the database with complex natural language (English) questions, improving their ability to make informed decisions regarding space debris. Recent breakthroughs in open question answering have been achieved using large language models that have been fine-tuned as dialog assistants, such as ChatGPT.2 These models, however, are 1 2 487 black boxes that store knowledge implicitly in their parameters which makes it hard to guarantee that their answers are supported by explicit evidence, understand their failures and update them when the supporting facts change. In contrast, parsing a question into a query program and then executing it on an explicit KB is guaranteed to provide a factual correct answer provided the KB and query program are correct. Our approach is particularly useful for applications such as satellite operations where accuracy and reliability are critical. The main challenge for this project was that no training set or example questions were available for the DISCOS KB. This issue, combined with the large amount of unique and diverse objects in the database, precluded a straightforward application of common supervised learning techniques. Although possible strategies for solving this task, such as direct semantic parsing of the query with seq2seq models, were identified in the literature, they suffer from problems with compositional generalization (Herzig et al., 2021; Furrer et al., 2020). Furthermore, very little work has been done on generalizing to KB element components that were never seen during training (Cao et al., 2022b; Das et al., 2021a; Huang et al., 2021). To overcome these challenges, we apply and adapt a methodology from the literature called Program Transfer (Cao et al., 2022b) to significantly reduce the required dataset for adequate generalization over the complete DISCOS KB. This is a two-step approach. For each user query first a program sketch is predicted, consisting of a sequence of query functions where the arguments are either variables or placeholders, then the representation of the query is compared to the representations of the KB entities, in order to fill out the placeholders with arguments relevant to the query text. The underlying query language of this approach is called Knowledge-orientedProgramming-Language (KoPL) for which two representative example questions are shown together with their decomposition into sketch and arguments in Figure 1. We also conduct a data collection study with domain experts, and we apply a data augmentation pipeline leveraging the underlying ontology of the KB and prompting a Large Language Model (LLM) to generate automatically more training examples. The architecture was retrained with different domain-specific LMs and baselines to determine the benefits of using a domain-specific pretrained encoder. The main contributions of this paper are: • Applying and adapting a methodology described in the literature for complex knowledge base question answering (CKBQA) on a novel industry-relevant database, with a large and dynamic set of unique entities; • Collecting a new dataset on this database from domain-experts and leveraging the in-context learning capability of LLMs for data augmentation on it; • Evaluating the use of domain-specific LMs as different encoders on our curated dataset; • Demonstrating the effectiveness of the approach by achieving comparable results to generalpurpose LLMs 2 Related Work Low-resource CKBQA Pre-trained language models have demonstrated state-of-the-art performance in semantic parsing for complex question answering on KBs where the same logic compounds are contained in both the training and validation sets (Furrer et al., 2020). However, they struggle with compositional generalization, where the “compounds” (combinations) of components are diverse between training and validation, even if all components (entity, relation, program filters) have been seen during training (Herzig et al., 2021). Das et al. (2021b) explored retrieval-based methods to pick the top n similar examples from the training set and use them as additional input for the prediction. In theory, this would make it possible to reason over changes on the KB by only adding new examples to the training set without the need of retraining the whole model. Another approach is adapting the architecture of language models to incorporate the structure of a KB directly for the prediction. For example, Huang et al. (2021) ranked FreeBase KB entities by using an EleasticSearch search API to identify these entities. When generating the query program, instead of entities a special token is predicted, which in the post-processing step get replaced by the top ranked entity identified by ElasticSearch. Although, achieving good results, it is unclear how this would translate to queries with multiple entities and also has the typical limitations of ElasticSearch. Another method is the Program Induction and Program Transfer method, where a sequence
Geschreven voor
- Instelling
- Abstract21083
- Vak
- Abstract21083
Documentinformatie
- Geüpload op
- 7 juli 2024
- Aantal pagina's
- 13
- Geschreven in
- 2023/2024
- Type
- Tentamen (uitwerkingen)
- Bevat
- Vragen en antwoorden
Onderwerpen
-
space agencies execute complex satellite operation
-
knowledge base question answering for space debris