| tags: [ ecology research practice ontology informatics open data ] categories: [reading ]

Madin (2007) Advancing ecological research with ontologies

(Madin et al. 2008)

Why are ontologies relevant to ecology?

Issue of effectively locating scientific data is hampered because our current approaches for describing it rely on the “ad hoc use of user-supplied keywords, and do not ensure that such terms are defined and used consistently.” The same is true when searching for relevant data, “users supply their own search terms, which are then matched against keywords assigned to datasets”. Ecology is particularly prone to this problem, and many terms often have multiple and variable meanings in different contexts. Ontologies can overcome this problem because they “provide a formal representation of the terminology and concepts in a scientific domain, which can be used to clarify the relationships among those terms and concepts”.

Ontologies are proliferating in molecular biology, and biomedical communities. Broader benefits include: closer collaboration, better synthetic analyses, and greater data-sharing capabilities. An example of a successful ontology, is the gene ontology consortium, whereby they unified the terminologies from different model-organism domains (fruit fly, mouse and yeast) to improve querying across their major database efforts.

What can we do with ontologies in ecology

Not an esoteric point!

What properties of ontologies facilitate data-sharing promise of ontologies in ecology? They hold promise as a “unifying mechanism for representing knowledge because they are interpretable by both humans and computer applications, and subsequently facilitate the use of automated reasoning for helping with arduous data management tasks that scientists deal with on a daily basis”.

Advanced information discovery in the context of ecological data landscape: Large amounts of ecological data are increasingly available for research, but finding data relevant to a given study is still difficult.

By mapping data points onto datasets “semantic annotation” advanced information discovery is possible. Can even have multiple annotations.. perhaps by different scientists. That way it’s simultaneously useful for separate lines of inquiry, or to capture differences of opinion about what the data represent.

State of ontology research in ecology

Very, very slow uptake. The authors state that this is because ontology and its requisite supporting application development are both widespread and well funded activities. They also argue that there is limited success because (comparatively) ecological research spans broad and interelated topics, and data are often collected for a targeted purpose. Without considering how they might be reused for broader projects and analyses.

Ontology development in ecology

The authors categorise three types of ontologies th8are are important in ecology, and give some selected examples:

  1. Domain-specific ontologies: “focus on capturing terminology used in specialized scientific disciplines or communities”
  2. Framework ontologies: “define general concepts and relationships that others can extend when building domain ontologies”… “They are designed to interconect existing domain-specific ontologies while providing a consistent foundation for future ontology-building efforts.” There are a number of ontologies in development that aim to describe ecological and environmental data, especially because observations and measurements are the primary information expressed within most scientific data. Framework ontologies can support “generic and flexible data management systems for diverse and non-standardized ecological and environmental datasets”
  3. Other approaches (i.e. non-ontological): Less-formal approaches, such as controlled-vocabulary, provides a set of selected terms or keywords that denote a distinct concept. A glossary is an extension of a controlled-vocabulary. They can elimiante problems with term ambiguity, and other natural language complications. However, their informal nature limites their ability to support data searching and integration tasks that rely on automated reasoning.

Incorporating ontologies in ecological information management

Ontologies can aid in scientists utilising and publishing ecological and environmental data. See Figure Below:

Publishing data to repositories (to share with other researchers), is often acompanied by the use of metadata standards, providing descriptive information (who created, licensing, data structure, methods of collection). One example is the Metacat data-management syustem, a distributed repository used by organisations like Ecological Society of America. Ontologies can enhacne the power of metadata, as well as enabling software capabilities in addition to semantic queries “smart” queries. E.g. data merging (integration), unit conversion, sensible summarisation that matches the underlying structure of the data.

  1. Framework ontologies can aid in this, with semantic annotations. It is needed to translate the units or taxa from different datasets. They can therefore help automate the merging of datasets by clarifying and relating their logical structures. Applications can then use this information to determine if two data attrivutes are compatible for a particular analytic purpose.
  2. Can enable sensible summarisation by exposing important contextual information about when, where and how measurements were taken. Software can then interpret sampling design and identify which variables represent (e.g. nesting or blocking factors) and summarise appropriately.
  3. Ontologies can be used in statistical analysis and modeling tools, e.g. workflow-automation systems. Providing graphical environments for creating and running analyses, and visualising results and tracking data provenance. They can aid researchers in finding relevant analyses, and indicate the data transformation and integration tasks required for analysis.

Relevance to my research

The above image shows a “proposed high-level architecture” for ecological and environmental data management at three levels. It’s this sort of thinking that I want to apply to considering reproducibility issues in ecology and conservation.

I can see this sort of high-level architecture being useful for solving a few problems in reproducibility:

  1. The problem of data selection when building process models can be mitigated by ontologies that help better synthetic and data-sharing capabilities in ecology.
  • can’t find data, it’s absent (but maybe it exists on a hard-drive somewhere… I think we still have a file-drawer problem in ecology!!) - open data and cultural uptake of data sharing will reduce this.. “herd immunity”
  • Can’t find data - but it’s actually present on the database you are looking at (but the text-based searching using user-defined keywords fails to make the match), or it’s on a different database
  • Can find data, can use OK
  • Can find data, can’t use because wrong shape, or isn’t adequately described. Perhaps we don’t know where or how it was collected.

I think it could also aid with QRPs in the case of eggregious or even unintentional statistical errors. For example, there are framework ontologies developed that can automatically the experimental design / nested structure of the data (given appropriate metadata). This then facilitaes appropriate synthesis of data for broader analyses… but ALSO wrt to QRPs, could be used to identify the underlying structure of the data, and if linked to some other statistical ontology, could identify whether the underlying properties of the data structure are appropriate for some given analysis.

To Qaeco’s data repository project: I think we should have a policy that all honours / masters students should submit their own data to A repository, not just with code, to facilitate ecological data sharing.

See also (2018) for a contemporary discussion on the role of ontologies with open standards in open science.

References

Culina, Antica, Miriam Baglioni, Tom W Crowther, Marcel E Visser, Saskia Woutersen-Windhouwer, and Paolo Manghi. 2018. “Navigating the unfolding open data landscape in ecology and evolution.” Nature Ecology & Evolution, February. Springer US, 1–7. https://doi.org/10.1038/s41559-017-0458-2.

Madin, Joshua S, Shawn Bowers, Mark P Schildhauer, and Matthew B Jones. 2008. “Advancing ecological research with ontologies.” Trends in Ecology & Evolution 23 (3): 159–68. https://doi.org/10.1016/j.tree.2007.11.007.