Expressive Scalable Querying over Integrated Linked Open Data


Project website

Pascal Hitzler Co-PI

Linked Open Data (LOD) is rapidly developing into an open data movement to connect a large variety of data across the World Wide Web using standards adopted by the World Wide Web Consortium (W3C). Driven by researchers, government agencies and companies, the resulting Web of Data has grown to over 1000 datasets and is showing exponential growth. However, simply putting collections of data on the Web will be of very limited value. The key to unlocking the value for developing more powerful search, browsing, exploration and analysis is to richly interlink or semantically integrate components of LOD. Given the size, growth rate, heterogeneity and growing areas of coverage, manual semantic integration or interlinking is not practical. Furthermore, current techniques focus on a construct owl:sameAs that is abused due to limited expressiveness, and hence is ineffective or yields poor quality of integration. What is needed is to be able to represent and identify richer and more explicit relationships between different entities, so that the richness of the real world is not crammed inaccurately and inappropriately into very limited types of relationships. At the same time, exponential growth of the LOD in terms of size and diversity creates challenges to identify and analyze datasets for both human and application consumptions. Even though popular datasets such as DBPedia, Freebase, MusicBrainz are well known and widely used in the community, there can be other hidden gems that will be useful for specialized applications.

To address the challenges, this project developed exploratory techniques to richly interlink components of LOD, address the challenges of querying the LOD cloud and propose approaches to discover datasets compress and create entity summaries.

Funding Agency: 

National Science Foundation


September, 2011


August, 2014