Natural Language Processing for MediaWiki: First major release of the Semantic Assistants Wiki-NLP Integration

Printer-friendly version

PDF version

Table of Contents

1. Introduction
2. Features
3. Application: NLP Wikis in Use
4. More Information
5. In the News

1. Introduction

We are happy to announce the first major release of our Semantic Assistants Wiki-NLP integration. This is the first comprehensive open source solution for bringing Natural Language Processing (NLP) to wiki users, in particular for wikis based on the well-known MediaWiki engine and its Semantic MediaWiki (SMW) extension. It can run any NLP pipeline deployed in the General Architecture for Text Engineering (GATE), brokered as web services through the Semantic Assistants server. This allows you to bring novel text mining assistants to wiki users, e.g., for automatically structuring wiki pages, answering questions in natural language, quality assurance, entity detection, summarization, among others. The results of the NLP analysis are written back to the wiki, allowing humans and AI to work collaboratively on wiki content. Additionally, semantic markup understood by the SMW extension can be automatically generated from NLP output, providing semantic search and query functionalities.

2. Features

The current release includes the following features:

Light-weight MediaWiki Extension: The Wiki-NLP integration is introduced to an existing MediaWiki engine through installing a light-weight extension. Without requiring modifications on the wiki engine, the extension adds a link to the wiki toolbox menu through which users can load the Wiki-NLP interface. Using this interface, users can then inquire about and invoke NLP services through the dynamically generated Wiki-NLP interface within the wiki environment. Therefore, no context switching is needed by the wiki users in order to use the NLP services.
NLP Pipeline Independent Architecture: The Wiki-NLP integration is backed by the Semantic Assistants server, which provides a service-oriented solution to offer NLP capabilities in a wiki system. Therefore, any NLP service available in a given Semantic Assistants server can be invoked through the Wiki-NLP integration on a wiki's content.
Flexible Wiki Input Handling: At times, a user's information need is scattered across multiple pages in the wiki. To address this problem, our Wiki-NLP integration allows wiki users to collect one or multiple pages of the wiki in a so-called "collection" and run an NLP service on the collected pages at once. This feature allows batch-processing of wiki pages, as well as gathering multiple input pages for pipelines analyzing multi-documents.
Flexible NLP Result Handling: The Wiki-NLP integration is also flexible in terms of where the NLP pipelines' output can be written. Upon a user's request, the pipeline results can be appended to an existing page body or its associated discussion page, create a new page, as well as writing to a wiki page in an external wiki, provided that it is supported by the Wiki-NLP integration architecture. Based on the type of results generated by the NLP pipeline, e.g., annotations or new files, the Wiki-NLP integration offers a simple template-based visualization capability that can be easily customized. Upon each successful NLP service execution, the Wiki-NLP integration automatically updates the existing results on the specified wiki page, where applicable.
Semantic Markup Generation: Where semantic metadata is generated by an NLP pipeline, the Wiki-NLP integration takes care of representing it in a formal language using the Semantic MediaWiki special markup. For generated metadata, the Wiki-NLP integration enriches the text with its equivalent markup and makes it permanent in the wiki repository. Therefore, for each generated result, both a user-friendly and machine-processable representation of the result is made available in the page. These markups are, in turn, transformed to RDF triples by the Semantic MediaWiki parsing engine, making them available for querying purposes as well as externalization to other applications.

Semantic enrichment of wiki text with the Wiki-NLP integration
For example, when the sentence "Mary won the first prize." is contained in a wiki page and processed by a Named Entity Detection pipeline, an XML document is generated by the Semantic Assistants server and returned back to the Wiki-NLP integration, which indicates "Mary" as an entity of type "Person". This XML document is then processed by our integration and transformed for Semantic MediaWiki into a formal representation in the form of markup. In our example, [[hasType::Person|Mary]] markup is generated and written into the wiki page. The generated markup can then be queried using Semantic MediaWiki's inline queries. For example, a simple query like {{#ask: [[hasType::Person]]}} can be used to retrieve all the entities in wiki content with the type "Person".
Wiki-independent Architecture: The Wiki-NLP integration was developed from the ground up with extensibility in mind. Although the provided examples show how the Wiki-NLP integration can be used within a MediaWiki instance, it has an extensible architecture, where support for other wiki engines can be added to the architecture with a reasonable amount of effort. Both the Semantic Assistants server and the Wiki-NLP integration have a semantic-based architecture that allows adding new services and wiki engines without major modifications of their base code.

3. Application: NLP Wikis in Use

Our open source Wiki-NLP solution is the result of more than 5 years of research [1], [2], [3] on the technical and social aspects of combining natural language processing with collaborative wiki systems. We developed a number of real-world wiki-based solutions that demonstrate how text mining assistants can effectively collaborate with humans on wiki content [4]. As part of this research, we investigated (i) the software engineering aspects of Wiki-NLP integrations, (ii) the usability for wiki users with different backgrounds, in particular those unfamiliar with NLP; and (iii) its effectiveness for helping users to develop and improve wiki content in a number of domains and tasks. To help you build your own Wiki-NLP solution, we documented a number of successful Wiki-NLP patterns in our Semantic Assistants Wiki-NLP Showcase.

In the DurmWiki project, we investigate the application of NLP to wikis in cultural heritage data management, helping wiki users finding relevant information through NLP pipelines for automatic index generation, question-answering, and summarization on wiki content [5]. Our Wiki-NLP approach allows to transform historical documents into a semantic knowledge base that can be queried through state-of-the-art semantic technologies [6].

For biomedical research, IntelliGenWiki is our solution for helping curators dealing with the large amount of publications in this area. Text mining assistants can aid humans in deciding which papers to curate (triage task) and extract entities (database curation task) through biomedical entity recognition, e.g., for organisms or mutations. Experiments measuring the time for manual vs. NLP/Wiki supported curation in a real-world project demonstrate the effectiveness of this idea [7].

With ReqWiki, we developed the first semantic open source platform for collaborative software requirements engineering [8]. Here, semantic assistants provide users with tools for entity extraction on domain documents and quality assurance services for improving the content of a software requirements specification (SRS). User studies confirmed the usability for software engineers unfamiliar with NLP and its effectiveness for improving requirements documents [9].

4. More Information

For further technical information, please see our Wiki-NLP Integration page. For a number of application examples, check out our Semantic Assistants Wiki-NLP Showcase. For commercial support or consulting requests, please contact us.