Business Information Systems
Business Information Systems
" Faceted Wikipedia Search (S. 1-2)
This paper presents Faceted Wikipedia Search, an alternative search interface for the English edition of Wikipedia. Faceted Wikipedia Search allows users to ask complex questions, like "Which rivers ?ow into the Rhine and are longer than 50 kilometers?" or "Which skyscrapers in China have more than 50 ?oors and were constructed before the year 2000?" against Wikipedia knowledge. Such questions cannot be answered using keyword-based search as provided by Google, Yahoo, or Wikipedia's own search engine.
In order to answers such questions, a search engine must facilitate structured knowledge which needs to be extracted from the underlying articles. On the user interface side, a search engine requires an interaction paradigm that enables inexperienced users to express complex questions against a heterogeneous information space in an exploratory fashion. For formulating queries, Faceted Wikipedia Search relies on the faceted search paradigm. Faceted search enables users to navigate a heterogeneous information space by combining text search with a progressive narrowing of choices along multiple dimensions [6,7,5].
The user subdivides an entity set into multiple subsets. Each subset is de?ned by an additional restriction on a property. These properties are called the facets. For example, facets of an entity "person" could be "nationality" and "year-of-birth". By selecting multiple facets, the user progressively expresses the di?erent aspects that make up his overall question. Realizing a faceted search interface for Wikipedia poses three challenges:
1. Structured knowledge needs to be extracted from Wikipedia with precision and recall that are high enough to meaningfully answer complex queries.
2. As Wikipedia describes a wide range of di?erent types of entities, a search engine must be able to deal with a large number of di?erent facets. As the number of facets per entity type may also be high, the search engine must apply smart heuristics to display only the facets that are likely to be relevant to the user.
3. Wikipedia describes millions of entities. In order to keep response times low, a search engine must be able to e?ciently deal with large amounts of entity data.
Faceted Wikipedia Search addresses these challenges by relying on two software components: The DBpedia Information Extraction Framework is used to extract structured knowledge from Wikipedia . neofonie search, a commercial search engine, is used as an e?cient faceted search implementation.
This paper is structured as follows: Section 2 describes the Faceted Wikipedia Search user interface and explains how facets are used for navigating and ?ltering Wikipedia knowledge. Section 3 gives an overview of the DBpedia Information Extraction Framework and the resulting DBpedia knowledge base. Section 4 describes how the e?cient handling of facets is realized inside neofonie search. Section 5 compares Faceted Wikipedia Search with related work."