TY - CONF T1 - Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data T2 - Scientific and Statistical Database Management, 22nd International Conference, SSDBM 2010 Y1 - 2010 A1 - Satya S. Sahoo A1 - Olivier Bodenreider A1 - Pascal Hitzler A1 - Amit Sheth A1 - Krishnaprasad Thirunarayan ED - Michael Gertz ED - Bertram Ludäscher KW - Biomedical knowledge repository KW - Context theory KW - Provenance context entity KW - Provenance Management Framework. KW - Provenir ontology KW - RDF reification AB -

The Semantic Web Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current Semantic Web provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples that hinders data sharing. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples without the use of RDF reification or blank nodes. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine to support provenance tracking on RDF data extracted from multiple sources, including biomedical literature and the UMLS Metathesaurus. The evaluations demonstrate a minimum of 49% reduction in total number of provenancespecific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, using the PACE approach improves the performance of complex provenance queries by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries. 

JF - Scientific and Statistical Database Management, 22nd International Conference, SSDBM 2010 PB - Springer CY - Heidelberg, Germany VL - 6187 UR - http://dx.doi.org/10.1007/978-3-642-13818-8_32 ER -