Some challenges for large-scale data management
Reflections from the scientific domain
Dr. Jose Manuel Gomez-Perez RD Director

This event took place on 13th March 2013 at 11:30am (11:30 GMT)
Knowledge Media Institute, Berrill Building, The Open University, Milton Keynes, United Kingdom, MK7 6AA

The digital universe is booming, especially in terms of the amount of
metadata and user-generated data available. This raises serious data
management challenges, including the identification, amongst all such
data, of the particular data pieces relevant to a specific purpose and
the observation of the lifecycle of those data entities. Finer grain
challenges include evolution and versioning and the impact that change
and non availability of resources may have on depending applications,
causing decay and eventually malfunction. In this talk, we focus on
these challenges with special emphasis on the preservation and reuse
of scientific workflows in data-intensive research. We introduce the
concept of workflow-centric Research Object (RO) as the means to
identify and structure the relevant resources for the execution of
workflows and to ensure the replicability of their results, addressing
data as first-class citizens. We also analyze the main reasons for
workflow (and therefore RO) decay in this particular domain and
propose methods and tools for its prevention. Finally, we reflect on
the lessons learnt and the potential use of these concepts in other
data-intensive domains.

