Search for meaning in the Internet


After five years, ending the big IT research program "Theseus". It will make information easier to find by using the so-called semantic technologies - thus taking on even Google.

How many people live in Hamburg? You type this question into Google, you get a straight answer instead of a list of three million web pages in which "Hamburg" happen and "inhabitants". The required information, the user may then - usually of a text - pick out yourself. The new search engine "Alexandria" responds the other hand simply "1770629". The software has thus understood that a certain property of the Hanseatic city is looking for and it is exactly this information.

Alexandria is a partial result of the IT research program "Theseus" in which have in addition to the Federal Ministry of Economics some 60 partners from research and industry. Their goal: to simplify access to information linking data to new knowledge and enable new services on the Internet. Therefore from 2007 to 2012 flowed around 100 million euros in funding, an additional 100 million contributed by the industry, including heavyweights such as SAP and Siemens. To this day, Theseus refer to some 50 patents, 800 publications and about 130 running systems. Thus Theseus was one of the largest IT research projects ever.



Whether the quality of the results can also be a record, difficult to assess. Firstly, virtually all relevant researchers in Germany in league with the project so that this country can hardly find experts for an independent assessment. Second, set up the most part of projects not to private users, but are professional information systems, such as archives. Therefore, they can not try it so easy.

An exception is the search engine Alexandria where everyone can convince themselves of the advances in semantic technologies. The idea is to link information so that computers "understand" its meaning. In Alexandria, an algorithm first analyzes the objects and relationships includes the question. In the search "Where Angela Merkel was born?" this would be a person, a place, as well as the relationship "was born in". In the next step, the software searches its database property "Angela Merkel" and verify that it has the relationship "was born in" with another object (a city) is linked. From this pool, the search engine then spits out the right answers - in this case, "Hamburg" and "Germany".

Currently, the knowledge base of Alexandria is still limited to places, people and organizations. The data comes from DBpedia (a database facts from Wikipedia articles extracted), Freebase (a similar database that Google has purchased 2010), the geography database GeoBase and some 1000 German-language news sites on the Internet. In addition, users can also contribute information.

The data is so finely differentiated that they represent complex relations - such as who was married to whom from when to when. The resolution of ambiguity dominates the search engine as well - it can distinguish, for example, the singer Sarah Connor from the film of the same figure. Self-statements in direct or indirect speech, the software recognizes as such, promises the Berlin software company Neofonie that Alexandria has developed. What sounds good in theory, leads quickly in practice to disillusion.

Even a simple sensitive overwhelmed Alexandria: The entry "How many inhabitants has hamburg?" only leads to a hodgepodge of links. To the question "Where Angela Merkel studied?" white Alexandria despite correct spelling no answer - it is listed as Leipzig study even on the Alexandria-tab to Merkel. The understanding of the issues does more bad than good. "What religion Angela Merkel?" Alexandria yet answered correctly - instead asks the user to "confession", must match the search engine.

Matched the importance of the review of individual users, as Alexandria went online in February - after the engine has been regularly highlighted as a highlight of the entire Theseus project. The developers of Alexandria qualify this claim: The project should only show what is possible with the semantic search in principle. Especially missing server capacity brake Alexandria says Neofonie developer Florian Kuhlmann. Therefore, they wanted to focus more on the future of cloud computing.

The first users of Alexandria technology are the online editors of the "star". They look so in their archives for related articles on a topic and generate a button topics ideas. That should inspire others: "We will move away from information that is hidden in body text documents," said Professor Stefan Decker, director of the Digital Enterprise Research Institute in Galway, Ireland. Semantic search engines could therefore "certainly threaten the business model of Google." However, Google intends to vacate the field is not without a fight: The Californians are also working on a semantic search and now intend to specific questions once the desired facts instead of Lin


Newspaper articles, game schedules, photos, records and concert recordings was a 25 terabytes large archive, Whose contents have tagged text, image and voice recognition software first automatically. Together with the creators of Alexandria, the researchers have developed Contentus of key words in a second step for semantic searches. This will create the basis for the planned "German Digital Library", Which aims to make the stocks of the 30,000 German cultural and scientific facilities available to everyone online.

In addition to the semantic technologies the Theseus project had another focus: the Internet of Services. In the future, services are traded on Internet marketplaces and combine freely. For example, a broken water pipe, artisans must tear up the floor, install new pipes and end up embarrassed again tiles. Today, a self-interested party looking for the right vendors and coordinate the dates with them. In the future it should be sufficient to describe the problem on an intelligent marketplace ("I have a broken pipe"). The Internet of Services will then automatically craftsmen in the area, check their calendar, the dates agreed and informed the landlord when the help is just around the corner.

Such a scenario can only be a reality if all providers describe their services in a language, understand the computer. That is the job of the "Unified Service Description Language" (USDL), Which was developed as part of the SAP-led subproject "Texo". It describes different aspects of a service in a standardized form - for example, the pricing model and the dependence of other services. "Could, for example, a craftsman enter all required information in a USDL editor that complements its website then to the USDL metadata," says Clemens wine from the German Research Center for Artificial Intelligence. Then Webcrawler - small programs that analyze Web pages - automatically integrate the information in a service portal.

The Internet of Services is closely linked to semantic technologies. "The system needs to understand the context # in which the work will be," said Krzysztof Janowicz semantics expert Professor of the University of California, Santa Barbara. "In an acute emergency, for example, plays a role in the distance, so that a crafter can be fast on-site help. Semantic technologies to describe chains of search services."

So are the 200 million is money well spent? Crucial to the success of Theseus will be Whether the ideas prevail internationally from Germany. USDL is already well on the way: The language will be included in the World Wide Web Consortium in to official recommendation. To normal web users to benefit from Theseus results, but is likely to take some time: make the most of the projects only the fundamentals, On which companies can build new services, and this process is just beginning for a final conclusion, it is silent too early.

Janowicz, who was not involved in the German Theseus finds commendable project anyway. While many services are quiet months away, but already one must develop the basis for it. "For such a landmark project is important because it brings together new providers, large enterprises and the science," says Janowicz, adding: "raw data are the most important resource of the 21st century - that Theseus was a very good investment." His colleague Pascal Hitzler is so appreciative words:

"The fact that this project began in 2007, is a sign of the farsightedness of donors," says the director of the Knowledge Engineering Lab at Wright State University in Dayton. "Semantic technologies have in the past five years just found in industrial practice sustainable appeal, as in the computer program, IBM's Watson. Theseus what Malthus funded exactly the right time." Professor Decker from Galway caps it: "Semantic technologies can have even greater impact than the introduction of the World Wide Web."


No comments:

Post a Comment