Technische Universität Braunschweig
Institut für Informationssysteme
Mühlenpfordtstraße 23, 2.OG
D-38106 Braunschweig
Conceptual Search: I have performed most of the research in this field during my activity in the CONCEPT project.
Technologies of interest: Data Mining and Machine learning on Big Data. Natural language processing techniques, opinion mining, flat or hierarchical, multidimensional clustering, and classification methods like Adaptive Boosting, Bayesian Classification, Support Vector Machines and Decision Trees belong to my area of expertise. Typical data sets I work with, are the well-known ClueWeb09 and ClueWeb12 corpora comprising about 60TB of uncompressed documents harvested from the Web and a locally hosted Virtuoso cache with over 5 billion triples from DBpedia, Freebase, YAGO, LinkedMDB, NewYork Times, DrugBase, MusicBrainz, GeoNames, DBTropes, CiteSeer and ACM data stores (this represents about 10% of all linked data available on the Web). To manage this volume of data I use graph databases like Neo4J, cloud based Lucene like inverted index technology like Elastic Search and MapReduce paradigms (Hadoop).
Projects: CONCEPT
Conceptual Queries in Entity-Centric Search: According to reports published by search engines like Yahoo! about 50% of the Web queries today, involves searching for entities. While simple, keyword-based search can very well be mastered with state-of-the-art boolean search, searching for entities by means of concepts, like for instance city car, gaming laptop or a business cellphone are not well supported by such techniques. Given a concept like city car, a person would immediately think of a small sized vehicle, easy to park and with low fuel consumption, something like the Volkswagen Polo or the Mercedes Smart. But for a machine, such concepts are nothing more than keywords.
A lot of work has been invested by the artificial intelligence (AI) community to build a system that is capable of reasoning much like a human. Cyc for instance is a well-known AI project attempting to assemble a comprehensive global ontology and knowledge base of common sense knowledge. This would empower machines to understand concepts and render human-like reasoning possible. Unfortunately, 30 years later, after investing 350 man-years of effort in teaching Cyc common sense knowledge, no real advances have been achieved. In contrast to such approaches, we believe flexible, contextual-based knowledge (and not one global ontology) is a better approach for this task. Fostered by the massive amount of information available today, such knowledge could be learned directly from the Web.
The outcome of this project will provide essential insights into how the meaning of concepts can be learned from a large volume of noisy information like it is the case with data on the Web. This raises multiple research questions: What definition of a concept is more suitable for this task? Is an intensional representation of a concept (through typical properties) helpful for nailing its meaning? How can property typicality be quantified? What about extensional concept representation? How can such representations be efficiently learned from huge volumes of heterogeneous data? What learning methods are suitable for these tasks?
Summary to date: 9 publications to international conferences, 11 Bachelor/Master theses, 3 software development projects (8 students per team and project) for building prototypes.
Thesis Type | Student Name | Title |
---|---|---|
Master thesis | Kalo, Jan-Christoph | Analyse des Transitivitätsproblems von Instance Matching Verfahren auf Linked Data |
Master thesis | Meine, Matthias | Product Search by Means of Natural Language |
Master thesis | Su, Rongfeng | Extracting Ontologies for Supporting Implicit Feature Resolution from Product Reviews |
Master thesis | Turmo, Juan Jose | Mining Semantic Related Terms for Product Features from Structured and Unstructured Data |
Diploma thesis | Loster, Michael-Reinhard | Opinion Mining & Sentiment Analysis in Reviews |
Diploma thesis | Zimmermann, Dirk | Einfluß von Typischen Entitäten auf die Festlegung von geeigneten Kategorien für den Entitätstyp |
Bachelor thesis | Dechand, Sergej | Analyzing User's Point of View in Feature based Opinion Mining |
Bachelor thesis | Dermitzel, Philipp | Auswirkungen von Datenqualität in Business Warehouse Umgebungen |
Bachelor thesis | Geilert, Felix | Analyse der Akzeptanz und Breitenverwendung der auf schema.org zur Verfügung stehenden Schemata im Web |
Bachelor thesis | Gröber, Christoph | Analyse von Paraphrasen für OpenIE Triple |
Bachelor thesis | Wille, Philipp | Establishing Proximity Boundaries for Concept Extraction in Product Reviews |