Taubert, Jan: ONDEX - a data integration framework for the life sciences. 2011
Inhalt
- 1 Introduction
- 2 Background and related work
- 2.1 Principles of data integration
- 2.1.1 Link integration and hypertext navigation
- 2.1.2 Data warehouses
- 2.1.3 View integration and mediator systems
- 2.1.4 Workflows
- 2.1.5 Mashups
- 2.2 Previous work
- 2.3 Survey of current data integration systems
- 2.4 Conclusion
- 3 Requirements
- 3.1 Current situation
- 3.2 Challenges for data integration
- 3.3 Comparison with previous and related work
- 3.4 Comparison of approaches to data integration
- 3.5 Requirements for ONDEX
- 4 Methods and principles
- 4.1 ONDEX integration data structure
- 4.1.1 Motivation
- 4.1.2 Semantics in ONDEX
- 4.1.3 Semantics of nodes
- 4.1.4 Semantics of edges
- 4.1.5 Provenance
- 4.1.6 References and synonyms
- 4.1.7 Generalised data structure
- 4.1.8 Context
- 4.1.9 Definition ONDEX integration data structure
- 4.1.10 Discussion
- 4.2 Data alignment
- 4.2.1 Motivation
- 4.2.2 Methods
- 4.2.2.1 Data import and export
- 4.2.2.2 Data integration methods and algorithms
- 4.2.2.3 Other data integration methods
- 4.2.2.4 Evaluation methods
- 4.2.3 Results
- 4.2.3.1 Mapping methods – Enzyme Nomenclature vs. Gene Ontology
- 4.2.3.2 Mapping methods – KEGG vs. AraCyc
- 4.2.3.3 Visualising results
- 4.2.4 Discussion
- 4.3 Exchanging integrated data
- 4.3.1 Motivation
- 4.3.2 Requirements for exchanging integrated data sets
- 4.3.3 The OXL format
- 4.3.3.1 A brief history of OXL
- 4.3.3.2 OXL as XML Schema
- 4.3.3.3 Support for data integration and text mining
- 4.3.3.4 Tool support and applications of OXL
- 4.3.3.5 The role of OXL in the ONDEX data integration framework
- 4.3.4 Discussion
- 4.3.5 Outlook
- 5 Design and implementation
- 5.1 System design
- 5.1.1 Knowledge modelling and domain independence
- 5.1.2 Formulating a consensus domain model in biology
- 5.1.3 Populating the domain model
- 5.1.3.1 Resolving and transforming conflicts
- 5.1.3.2 Structured data formats
- 5.1.3.3 Dealing with unstructured data
- 5.1.4 Data filtering and knowledge extraction
- 5.1.5 Workflows
- 5.2 Implementing integration data structure
- 5.2.1 Encapsulation
- 5.2.2 Inheritance
- 5.2.3 Association
- 5.2.4 Polymorphism
- 5.2.5 Aspect oriented development
- 5.2.6 Graph implementations and persistency
- 5.3 Graph querying and information retrieval
- 6 Use cases
- 6.1 Improving genome annotations for Arabidopsis thaliana
- 6.1.1 Motivation
- 6.1.2 Data integration approach
- 6.1.3 Data integration exemplar
- 6.1.4 Data integration pipeline
- 6.1.5 Evaluation of annotation methods
- 6.1.6 Discussion
- 6.2 Prediction of potential pathogenicity genes in Fusarium graminearum
- 6.3 Constructing a consensus metabolic network for Arabidopsis thaliana
- 6.4 Analysis of social networks
- 6.5 ONDEX SABR project and its applications
- 6.5.1.1 Identifying new genetic and molecular targets to improve bioenergy crops
- 6.5.1.2 Integration, augmentation and validation of yeast metabolome models
- 6.5.1.3 Supporting research into the role of telomere function in ageing
- 6.5.1.4 Modelling processes for fruit ripening and flavour development in Tomato
- 6.5.1.5 Differences in responses to carcinogenic substances in human, rat and mouse
- 6.5.1.6 Plant Responses to Environmental STress in Arabidopsis (PRESTA)
- 7 Conclusion and outlook
- 7.1 Summary
- 7.2 Design decisions
- 7.3 Addressing the challenges
- 7.4 Comparison with related work
- 7.5 Outlook
- 8 Glossary
- 8.1 Definitions
- 8.1.1 Graph theory
- 8.1.2 Properties on relations
- 8.1.3 Object-oriented development and UML
- 8.1.4 Aspect-oriented software development
- 8.2 Knowledge representation
- 8.2.1 Semantic network
- 8.2.2 Conceptual graphs
- 8.2.3 Ontology
- 8.2.4 Contexts
- 8.2.5 Domain knowledge
- 8.2.6 Domain modelling
- 8.2.7 Hierarchy and taxonomies
- 8.2.8 Controlled vocabulary
- 8.3 Terms used in ONDEX
- 9 Appendix
- Table of figures
- References
