WP2: Definition os Support to Research Communities
Objectives
The objectives of this workpackage are to define the support required by Research Communities and to test and validate the state-of-the-art services developed by INDIGO to ensure that they will result in an increased use of production e-infrastructures in Europe, and in particular through the enhancement of services to share, manage and process research data. Research Communities covering a wide and significant spectrum of areas and expertise are represented through the participation of relevant institutions associated to ESFRIs or EIROs:
Biological and Medical Sciences: EuroBioImaging –BBMRI (UPV), ELIXIR (CNR), INSTRUCT (U.Utrecht, CIRMMP)
- Social Sciences and Humanities: DARIAH (RBI), DCH-RP (ICCU)
Environmental and Earth Sciences: LifeWatch (CSIC), EMSO (INGV), ENES (CMCC)
- Physical Sciences: LBT, CTA (INAF) [+conduit to WLCG+HEP from CERN]
One of the challenges of the project from the point of view of the Research Communities is to assure that the resulting frameworks will have a reasonable learning curve, and in particular that usual integrated tools (like Matlab/Octave, R-Studio, ROOT, etc) will keep the current interface, while benefiting from the access to powerful, on computing and storage, e-infrastructures resources.
A clear indicator of success for the project is associated to this workpackage: the number of users in these Research Communities using INDIGO to access production e-Infrastructures (EGI, PRACE) and the volume of the resources, data storage and computing hours, employed to produce first class research results. Through this workpackage, INDIGO will also have a direct interaction with proposals from the complementary call E-INFRADEV, to assure the collaboration where required, and promoting a larger impact.
Tasks
Task 2.1 - Research Communities Requirements
- Analyse the use cases proposed by the communities participating to the consortium. Capture the requirements for efficiently running the applications and workflows on Cloud, Grid or HPC infrastructures.
- Capture requirements generated by user communities not part of the project (such as the EGI Federated Cloud users), which are relevant for the outputs of the project
- Liaise with the INFRADEV-4 projects to enable synergies between the projects, and interoperability between the INDIGO outputs and the VRE to be deployed by the E-INFRA-9 projects.
- Produce an integrated document with the requirements captured, prioritized and grouped by technical areas, for example: Cloud, HPC, Grid and Data management.
Task 2.2 - Defining support to Research Data
To guarantee a smooth and widespread usability of INDIGO, an appropriate integration and combination approach has to take into account the different Reference Models used by the Research Communities and Research Infrastructures and the diversity and heterogeneities of data services and catalogues. This task follows the data research use and management of the Research Communities and Research Infrastructures and points out the different needs at the data life-cycle level. In particular this task shall undertake a survey on the research communities to collect and analyze the individual Data Management Plans (DMP) and data-life-cycle documentation with the aim to ensure that the full data cycle and components will be supported in INDIGO, and with the aim to provide adequate specifications for the compliance with INDIGO. Accordingly, the following activities are foreseen:
- Development of individual search activities to acquire and analyze the available DMP of the research communities/infrastructures with special attention to distributed/heterogeneous data services and catalogues, and to available open data;
Acquisition of procedure details/parameters (i.e., DMP, Collection, Authenticity & Provenance, Data Preservation) to elaborate the specifications for data ingestion and use in INDIGO;
- Definition of the specifications of INDIGO ingestion integrity test.
Task 2.3 - Application Test and Validation
The main objective of T2.3 is to ensure that all the middleware and other solutions developed in WP3 and WP6 are meeting the needs and requirements of the various user communities. It is therefore crucial to properly test and validate them and demonstrate their applicability on real use cases. To meet this objective the following sub-tasks are defined: * Definition of use cases. We have already identified a number of use cases that will be implemented from the start of the project. These include:
- a local, self-contained version of the HADDOCK portal in a VM (to be used for both multicore and cluster-like implementations). HADDOCK is a typical high-throughput, highly distributed application, which has already been ported to the grid and is widely used (>4400 users worldwide). - A multi-threading molecular dynamics use case based on the GROMACS and/or AMBER software for testing VM with a large number of cores (possibly with connections to PRACE). - An MPI-based molecular dynamics use case to run on a virtualized cloud cluster. - An approach for the characterization of internal dynamics in multi-domain proteins integrating different types of experimental data - A Climate model intercomparison analysis, based on big data analytics workflows of climate data operators (including data reduction, re-gridding, intercomparison as well as statistical, outlier and ensemble analysis) on multi-terabyte climate datasets from large data collections (e.g. CMIP5). - An astronomical pipeline to reduce proprietary or public data from LBT telescope (acquire data from LBT archive and running pipeline on a virtualized cloud cluster) and CTA simulations. - An instantiator of self-contained GALAXY servers running on VMs. GALAXY is a workflow manager well known to the bioinformatics community. It is in particular widely adopted for setting up complex Next Generation Sequencing data analysis pipelines.
Additional use cases will be defined based on the input of T2.1 and in collaboration with WP6 in order to test the workflow services to be developed. The definition and partial implementation of use cases will be reported in deliverable D2.3.
Creation of VMs for each use case. For each use case defined, a virtual machine will be provided, meeting all requirements for running on the INDIGO-DataCloud testing infrastructure.
Implementation of automatic probes (e.g. NAGIOS-like) for performing all tests on a regular basis. These will allow validation and future monitoring of the INDIGO-DataCloud infrastructure, and will be made in coordination with WP3.
Creation of generic use case examples that can be used for dissemination and training purposes (e.g. in Task 2.4). These examples will evolve during the project to take into account the new solutions provided by INDIGO-DataCloud.
Task 2.4 - Dissemination towards Research Communities
Dissemination activities target both project partners and external researchers, as well as any scientific bodies interested in intermediate and final results of the project. The information will be classified according to the target audience (internal, external) and according to the state of work (progress of ongoing activities, preliminary results, intermediate results, final results).
An important feature of these dissemination activities, i.e. the promotion of the project solutions, is that it is be based on an strong engagement of the Research Communities partners, that include relevant ESFRIs, EIROS, and other large research initiatives in Europe. These partners will contribute with the identification of the best forums and dissemination channels, and will help also by providing specific examples of the usefulness of INDIGO solutions in their communities. The coordination of these activities, and the preparation of specific support, will be handled by RBI, with wide experience on these tasks, and by EGI.eu, that through its experience in large dissemination events and training activities, will contribute to a wider dissemination to many research areas.
Following the ideas presented in section 2.2 for the initial dissemination plan, a structured plan will be implemented covering the whole project duration. The introduction of the cloud service technology platform is also an opportunity to integrate existing advanced technologies, including multimedia, hypermedia and other new models that will make the access to this platform attractive and easier, and so will also contribute to the sustainability of the project results. Different outreach actions will be made to assure the dissemination towards the general and specific research communities: for example, the use of the EGI.eu Community Forum (general), the European Geosciences Union (EGU) and the American Geophysical Union (AGU) for Environmental and Earth Sciences, CHEP for High Energy Physics, INSTRUCT annual meetings and relevant large life science conferences for Biomedical sciences, and also industry-specific meeting like the conferences targeting industry for structural biology/life sciences - pharmas, biotechs (see for example http://www.psdi2014.org).
The following is an initial list of activities planned within this task to disseminate the project results (cf. initial dissemination plan section in 2.2):
- Promote the presentation of project results through contributions at scientific conferences, publications in journals, participation in workshops and scientific events. Colocate as possible promotional stands and/or material for the general and scientific public.
Organize specific workshops, tutorials and hackathon events to disseminate the project results at relevant technical conferences (EGI TF, RDA meetings, EUDAT meetings, CloudOpen Europe, ESOCC, etc.) as well as a number of solicited workshops at relevant organizations.
- Attract new potential customers through short on-line training courses on ‘application enabling’. Promote them in Universities and Reseach Infrastructures.
- Setup the required e-infrastructure components (in collaboration with task T2.3 and WP3) that will facilitate both training and demos at workshops, and on-line courses.
- Setup a section of the project website/portal to present, from the perspective of final research users and of developers and technologists, the project facts, current progress and results, as well as provide links to the corresponding documentation, and also to repositories. Ensure that the project outcomes are available, well documented, public and searchable.
- Collaborate technically to the design of press releases, updates of the project story-factsheet, and dedicated technology factsheets of the solutions, including produced pipelines/workflows, that can be used to attract new customers.
- Present results at identified technology industry-related events in order to raise awareness of the availability of the project results, and the channels to support their exploitation.
Task 2.5 - Sustainability: exploitation strategy in an integrated e-Science framework
Task T2.5 aims to provide an exploitation strategy for INDIGO products, assuring the sustainability of the initiative. The task will take into account how research communities, technical developers and e-infrastructure providers, including commercial ones, are organized as ecosystems, and will discuss, following the analogy, how to guarantee a healthy evolution:
- Analyse, considering funding and organisational constraints, and existing models (cf. for example EGI Organisational Ecosystem), a global framework of e-Science, including the different stakeholders, their roles and functions.
- Identify, from the point of view of Research Infrastructures and Communities, key components developed and supported within INDIGO, and the way they are integrated into applications providing added value.
- Identify potential involvement of industrial partners, starting with those already contributing to INDIGO JRA, and including also selected units within large corporations and SMEs mainly in Europe. For example, if a pool of Research Infrastructures could be interested in assuring support for INDIGO solutions not directly but through a commercial partner.
- Define resources, funding sources, and potential agreements with research consortiums and/or commercial companies (arising from the previous activity), to assure the sustainability of INDIGO framework at medium and long term.
- Extend the study to a global scope in Europe, and to a wider international range (US, Asia, Latin America, Australia).
- Eventually, consider the creation the INDIGO Open Source consortium and promote the involvement of European companies to support key developments.
Deliverables
Deliverable 2.1, Month 3 (R/PU): Initial requirements from Research Communities - ONGOING
- 2.2, Month 3 (R, DEC/PU): Dissemination plan oriented to Research Communities
- 2.3, Month 8 (R/PU): Specifications of use cases for testing and validation purposes.
- 2.4, Month 9 (R/PU): Confirmation of support to initial requirements from JRA design and extended list of requirements.
- 2.5, Month 15 (R, DEC/PU): Report on dissemination effort and impact.
- 2.6, Month 21 (R/CO): Exploitation strategy and sustainability.
- 2.7, Month 24 (R/PU) Specifications of data ingestion and use in INDIGO
- 2.8, Month 27 (R/PU): Test and validation suite and results.
- 2.9, Month 30 (R/PU): Exploitation analysis based on agreements made and usage statistics