TY  - CONF
T1  - Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data
T2  - Scientific and Statistical Database Management, 22nd International Conference, SSDBM 2010
Y1  - 2010
A1  - Satya S. Sahoo
A1  - Olivier Bodenreider
A1  - Pascal Hitzler
A1  - Amit Sheth
A1  - Krishnaprasad Thirunarayan
ED  - Michael Gertz
ED  - Bertram Ludäscher
KW  - Biomedical knowledge repository
KW  - Context theory
KW  - Provenance context entity
KW  - Provenance Management Framework.
KW  - Provenir ontology
KW  - RDF reification
AB  - <p class="rtejustify">The Semantic Web Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current Semantic Web provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples that hinders data sharing. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples without the use of RDF reification or blank nodes. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine to support provenance tracking on RDF data extracted from multiple sources, including biomedical literature and the UMLS Metathesaurus. The evaluations demonstrate a minimum of 49% reduction in total number of provenancespecific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, using the PACE approach improves the performance of complex provenance queries by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.&nbsp;</p>
JF  - Scientific and Statistical Database Management, 22nd International Conference, SSDBM 2010
PB  - Springer
CY  - Heidelberg, Germany
VL  - 6187
UR  - http://dx.doi.org/10.1007/978-3-642-13818-8_32
ER  - 

TY  - CONF
T1  - Ontology Driven Integration of Biology Experiment Data
T2  - Ohio Collaborative Conference on BioInformatics (OCCBIO 2009), Posters & Demos
Y1  - 2009
A1  - Raghava Mutharaju
A1  - Satya S. Sahoo
A1  - D. Brent Weatherly
A1  - Pramod Anantharam
A1  - Flora Logan
A1  - Amit Sheth
A1  - Rick Tarleton
JF  - Ohio Collaborative Conference on BioInformatics (OCCBIO 2009), Posters & Demos
CY  - Cleveland, OH, USA
ER  - 

TY  - CONF
T1  - Ontology-Driven Provenance Management in eScience: An Application in Parasite Research
T2  - On the Move to Meaningful Internet Systems: OTM 2009, Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Proceedings, Part II
Y1  - 2009
A1  - Satya S. Sahoo
A1  - D. Brent Weatherly
A1  - Raghava Mutharaju
A1  - Pramod Anantharam
A1  - Amit Sheth
A1  - Rick Tarleton
ED  - Robert Meersman
ED  - Tharam S. Dillon
ED  - Pilar Herrero
AB  - <p class="rtejustify">Provenance, from the French word “provenir”, describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.</p>
JF  - On the Move to Meaningful Internet Systems: OTM 2009, Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Proceedings, Part II
PB  - Springer
CY  - Vilamoura, Portugal
VL  - 5871
UR  - http://dx.doi.org/10.1007/978-3-642-05151-7_18
ER  - 

TY  - CONF
T1  - Trykipedia: Collaborative Bio-Ontology Development using Wiki Environment
T2  - Ohio Collaborative Conference on BioInformatics (OCCBIO 2009), Posters & Demos
Y1  - 2009
A1  - Pramod Anantharam
A1  - Satya S. Sahoo
A1  - D. Brent Weatherly
A1  - Flora Logan
A1  - Raghava Mutharaju
A1  - Amit Sheth
A1  - Rick Tarleton
JF  - Ohio Collaborative Conference on BioInformatics (OCCBIO 2009), Posters & Demos
CY  - Cleveland, OH, USA
ER  -