Special Track: Large Language Models for Knowledge Engineering —Program
Title | Authors | Abstract | Topic | Time |
OntoChat: a Framework for Conversational Ontology Engineering using Language Models | Bohui Zhang, Valentina Anita Carriero, Katrin Schreiberhuber, Stefani Tsaneva, Lucia Sanchez Gonzalez, Jongmo Kim and Jacopo de Berardinis | Ontology engineering (OE) in large projects poses a number of challenges arising from the heterogeneous backgrounds of the various stakeholders, domain experts, and their complex interactions with ontology designers. This multi-party interaction often creates systematic ambiguities and biases from the elicitation of ontology requirements, which directly affect the design, evaluation and may jeopardise the target reuse. Meanwhile, current OE methodologies strongly rely on manual activities (e.g., interviews, discussion pages). After collecting evidence on the most crucial OE activities, we introduce \textbf{OntoChat}, a framework for conversational ontology engineering that supports requirement elicitation, analysis, and testing. By interacting with a conversational agent, users can steer the creation of user stories and the extraction of competency questions, while receiving computational support to analyse the overall requirements and test early versions of the resulting ontologies. We evaluate OntoChat by replicating the engineering of the Music Meta Ontology, and collecting preliminary metrics on the effectiveness of each component from users. We release all code at https://github.com/King-s-Knowledge-Graph-Lab/OntoChat. | Requirements | 11.00 – 11.20 |
Validating Semantic Artifacts With Large Language Models | Nilay Tufek, Aparna Saisree Thuluva, Tathagata Bandyopadhyay, Valentin Philipp Just, Marta Sabou, Fajar Ekaputra and Allan Hanbury | As part of knowledge engineering workflows, semantic artifacts, such as ontologies, knowledge graphs or semantic descriptions based on industrial standards, are often validated in terms of their compliance with requirements expressed in natural language (e.g., ontology competency questions, standard specifications). Key to this process is the translation of the requirements in machine-actionable queries (e.g., SPARQL) that can automate the validation process. This manual translation process is time-consuming, error-prone and challenging, especially in areas where domain experts might lack knowledge of semantic technologies. In this paper, we propose a Large Language Models (LLMs) based approach to translate requirements texts into SPARQL queries and test it in validation use cases related to SAREF and OPC UA Robotics. F1 scores of 88-100% indicate the feasibility of the approach and its potential impact on ensuring high quality semantic artifacts and further uptake of the semantic technologies (industrial) domains. | Requirements, | 11.20 – 11.40 |
NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning | Nadeen Fathallah, Arunav Das, Stefano De Giorgis, Andrea Poltronieri, Peter Haase and Liubov Kovriguina | We address the task of ontology learning by combining the structured NeOn methodology framework with Large Language Models (LLMs) for translating natural language domain descriptions into Turtle syntax ontologies. The main contribution of the paper is a prompt pipeline tailored for domain-agnostic modeling, exemplified through the application to a domain-specific case study: the wine ontology. The resulting pipeline is used to develop NeOn-GPT, a prototype tool for automatic ontology modeling, and is integrated into the metaphactory platform. NeOn-GPT leverages the systematic approach of the NeOn methodology and LLMs’ generative capabilities to facilitate a more efficient and effective ontology development process. We evaluate the proposed approach by conducting comprehensive evaluations using the Stanford wine ontology as the gold standard. The obtained results show, that LLMs are not fully equipped to perform procedural tasks required for ontology development, and lack the reasoning skills and domain expertise needed. Overall, LLMs require integration into workflow or trajectory tools for continuous knowledge engineering tasks. Nevertheless, LLMs can significantly alleviate the time and expertise needed. Our code base is publicly available for research and development purposes, accessible at: https://github.com/andreamust/NEON-GPT. | Onto dev | 11.40 – 12.00 |
Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023 | Johannes Frey, Lars-Peter Meyer, Felix Brei, Sabine Gruender and Michael Martin | In this study, we evaluate the evolution of LLM capabilities w.r.t. the RDF Turtle and SPARQL language as foundational skills to assist with various KGE tasks. We measure the LLM response quality using 6 LLM-KG-Bench tasks for a total of 15 LLM versions available over the course of 2023, covering 5 different “major version” LLM classes (GPT3.5 Turbo, GPT4, Claude-1.x, Claude-2.x, and Claude-instant-1.x). |
Various tasks from creation | 12.00 – 12.20 |
Can LLMs Generate Competency Questions? | Youssra Rebboud, Lionel Tailhardat, Pasquale Lisena and Raphaël Troncy. | Large Language Models have shown high performances in a large number of tasks, being recently applied also to Knowledge Graphs and data representation. An important step for data modeling consists in the definition of a set of competency questions, which are often used as a guide for the development of an ontology and as an evaluation benchmark. This study would like to investigate the suitability of LLMs for the automatic generation of competency questions knowing the developed ontology. Different models are compared in different settings in order to give a comprehensive overview of the state-of-art. | Requirements | 14.00 – 14.20 |
The Role of Generative AI in Competency Question Retrofitting | Reham Alharbi, Valentina Tamma, Floriana Grasso and Terry Payne | Competency Questions (CQs) are essential in ontology engineering; they express an ontology’s functional requirements as natural language questions. They offer crucial insights into an ontology’s scope and are pivotal for various tasks like ontology reuse, testing, requirement specification, and pattern definition. Despite their importance, the practice of publishing CQs alongside ontological artefacts is not commonly adopted. In this study, we introduce a novel method for retrofitting CQs from existing ontologies using Generative AI, specifically Large Language Models (LLMs). We explore how the control parameters in different LLMs, GPT 3.5 and 4 in particular affect their performance and investigate the interplay between prompts and configuration for the retrofitting of realistic CQs. | Requirements | 14.20 – 14.40 |
Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models | Bradley Allen and Paul Groth | A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class. We evaluate the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method’s classification performance achieves a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors shows that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations. These results show how large language models can assist knowledge engineers in the process of knowledge graph refinement. The code and data are available on Github. | Class membership | 14.40 – 15.00 |
LLMs4OM: Matching Ontologies with Large Language Models | Hamed Babaei Giglou, Jennifer D’Souza, Felix Engel and Sören Auer. | Ontology Matching (OM), is a critical task in knowledge integration, where aligning heterogeneous ontologies facilitates data interoperability and knowledge sharing. Traditional OM systems often rely on expert knowledge or predictive models, with limited exploration of the potential of Large Language Models (LLMs). We present the LLMs4OM framework, a novel approach to evaluate the effectiveness of LLMs in OM tasks. This framework utilizes two modules for retrieval and matching, respectively, enhanced by zero-shot prompting across three ontology representations: concept, concept-parent, and concept-children. Through comprehensive evaluations using 20 OM datasets from various domains, we demonstrate that LLMs, under the LLMs4OM framework, can match and even surpass the performance of traditional OM systems, particularly in complex matching scenarios. Our results highlight the potential of LLMs to significantly contribute to the field of OM. | Ontology marching | 15.00 – 15.20 |
Column Property Annotation using Large Language Models | Keti Korini and Christian Bizer | Column property annotation (CPA), also known as column relationship prediction, is the task of predicting the semantic relationship between two columns in a table given a set of candidate relationships. CPA annotations are used in downstream tasks such as data search, data integration, or knowledge graph enrichment. This paper is the first to explore the usage of large language models (LLMs) for the CPA task. We experiment with different zero-shot prompts for the CPA task which we evaluate using two OpenAI models (GPT-3.5, GPT-4) and the open-source model SOLAR. We find GPT-3.5 to be quite sensitive to variations of the prompt, while GPT-4 reaches a high performance independent of the variation of the prompt. We further explore the scenario where training data for the CPA task is available and can be used for selecting demonstrations or fine-tuning the model. We show that a fine-tuned GPT-3.5 model outperforms a RoBERTa model that was fine-tuned on the same data by 11% in F1. Comparing in-context learning via demonstrations and fine-tuning shows that the fine-tuned GPT-3.5 performs 9% F1 better than the same model given demonstrations. The fine-tuned GPT-3.5 model also outperforms zero-shot GPT-4 by around 2% F1 for the dataset on which is was fine-tuned, while not generalizing that good to other CPA datasets. | Column annotation | 16.00 – 16.20 |
12 shades of RDF: Impact of Syntaxes on Data Extraction with Language Models | Célian Ringwald, Fabien Gandon, Catherine Faron, Franck Michel and Hanna Abi Akl | The fine-tuning of generative pre-trained language models (PLMs) on a new task can be impacted by the choice made for representing the inputs and outputs. This article focuses on the linearization process used to structure and represent, as output, facts extracted from text. On a restricted relation extraction (RE) task, we challenged T5 and BART by fine-tuning them on 12 linearizations, including RDF standard syntaxes and variations. Our benchmark covers: the validity of the produced triples, the performance of the model, the training behaviours and the resources needed. We show these PLMs can learn some syntaxes more easily than others, and we identify an efficient “Turtle Light” syntax supporting the quick and robust learning of the RE task. | Input and output | 16.20 – 16.40 |