welcome: please sign in
location: Diff for "INDIGO/WP2"
Differences between revisions 8 and 9
Revision 8 as of 2015-04-27 10:57:09
Size: 6966
Editor: aguilarf
Comment:
Revision 9 as of 2015-04-27 10:59:01
Size: 6984
Editor: aguilarf
Comment:
Deletions are marked like this. Additions are marked like this.
Line 35: Line 35:
 
Line 36: Line 37:
 
Line 37: Line 39:
 
Line 38: Line 41:
 
Line 39: Line 43:
 
Line 40: Line 45:
 

WP2: Definition os Support to Research Communities

Objectives

The objectives of this workpackage are to define the support required by Research Communities and to test and validate the state-of-the-art services developed by INDIGO to ensure that they will result in an increased use of production e-infrastructures in Europe, and in particular through the enhancement of services to share, manage and process research data. Research Communities covering a wide and significant spectrum of areas and expertise are represented through the participation of relevant institutions associated to ESFRIs or EIROs:

  • Biological and Medical Sciences: EuroBioImaging –BBMRI (UPV), ELIXIR (CNR), INSTRUCT (U.Utrecht, CIRMMP)

  • Social Sciences and Humanities: DARIAH (RBI), DCH-RP (ICCU)
  • Environmental and Earth Sciences: LifeWatch (CSIC), EMSO (INGV), ENES (CMCC)

  • Physical Sciences: LBT, CTA (INAF) [+conduit to WLCG+HEP from CERN]

One of the challenges of the project from the point of view of the Research Communities is to assure that the resulting frameworks will have a reasonable learning curve, and in particular that usual integrated tools (like Matlab/Octave, R-Studio, ROOT, etc) will keep the current interface, while benefiting from the access to powerful, on computing and storage, e-infrastructures resources.

A clear indicator of success for the project is associated to this workpackage: the number of users in these Research Communities using INDIGO to access production e-Infrastructures (EGI, PRACE) and the volume of the resources, data storage and computing hours, employed to produce first class research results. Through this workpackage, INDIGO will also have a direct interaction with proposals from the complementary call E-INFRADEV, to assure the collaboration where required, and promoting a larger impact.

Task 2.1

  • Analyse the use cases proposed by the communities participating to the consortium. Capture the requirements for efficiently running the applications and workflows on Cloud, Grid or HPC infrastructures.
  • Capture requirements generated by user communities not part of the project (such as the EGI Federated Cloud users), which are relevant for the outputs of the project
  • Liaise with the INFRADEV-4 projects to enable synergies between the projects, and interoperability between the INDIGO outputs and the VRE to be deployed by the E-INFRA-9 projects.
  • Produce an integrated document with the requirements captured, prioritized and grouped by technical areas, for example: Cloud, HPC, Grid and Data management.

Task 2.2

To guarantee a smooth and widespread usability of INDIGO, an appropriate integration and combination approach has to take into account the different Reference Models used by the Research Communities and Research Infrastructures and the diversity and heterogeneities of data services and catalogues. This task follows the data research use and management of the Research Communities and Research Infrastructures and points out the different needs at the data life-cycle level. In particular this task shall undertake a survey on the research communities to collect and analyze the individual Data Management Plans (DMP) and data-life-cycle documentation with the aim to ensure that the full data cycle and components will be supported in INDIGO, and with the aim to provide adequate specifications for the compliance with INDIGO. Accordingly, the following activities are foreseen:

  • Development of individual search activities to acquire and analyze the available DMP of the research communities/infrastructures with special attention to distributed/heterogeneous data services and catalogues, and to available open data;
  • Acquisition of procedure details/parameters (i.e., DMP, Collection, Authenticity & Provenance, Data Preservation) to elaborate the specifications for data ingestion and use in INDIGO;

  • Definition of the specifications of INDIGO ingestion integrity test.

Task 2.3

The main objective of T2.3 is to ensure that all the middleware and other solutions developed in WP3 and WP6 are meeting the needs and requirements of the various user communities. It is therefore crucial to properly test and validate them and demonstrate their applicability on real use cases. To meet this objective the following sub-tasks are defined: * Definition of use cases. We have already identified a number of use cases that will be implemented from the start of the project. These include:

  • - a local, self-contained version of the HADDOCK portal in a VM (to be used for both multicore and cluster-like implementations). HADDOCK is a typical high-throughput, highly distributed application, which has already been ported to the grid and is widely used (>4400 users worldwide). - A multi-threading molecular dynamics use case based on the GROMACS and/or AMBER software for testing VM with a large number of cores (possibly with connections to PRACE). - An MPI-based molecular dynamics use case to run on a virtualized cloud cluster. - An approach for the characterization of internal dynamics in multi-domain proteins integrating different types of experimental data - A Climate model intercomparison analysis, based on big data analytics workflows of climate data operators (including data reduction, re-gridding, intercomparison as well as statistical, outlier and ensemble analysis) on multi-terabyte climate datasets from large data collections (e.g. CMIP5). - An astronomical pipeline to reduce proprietary or public data from LBT telescope (acquire data from LBT archive and running pipeline on a virtualized cloud cluster) and CTA simulations. - An instantiator of self-contained GALAXY servers running on VMs. GALAXY is a workflow manager well known to the bioinformatics community. It is in particular widely adopted for setting up complex Next Generation Sequencing data analysis pipelines.

Additional use cases will be defined based on the input of T2.1 and in collaboration with WP6 in order to test the workflow services to be developed. The definition and partial implementation of use cases will be reported in deliverable D2.3.

  • Creation of VMs for each use case. For each use case defined, a virtual machine will be provided, meeting all requirements for running on the INDIGO-DataCloud testing infrastructure.

  • Implementation of automatic probes (e.g. NAGIOS-like) for performing all tests on a regular basis. These will allow validation and future monitoring of the INDIGO-DataCloud infrastructure, and will be made in coordination with WP3.

  • Creation of generic use case examples that can be used for dissemination and training purposes (e.g. in Task 2.4). These examples will evolve during the project to take into account the new solutions provided by INDIGO-DataCloud.

Task 2.4

Task 2.5

Meetings

Kick-off Meeting - WP2

eciencia: INDIGO/WP2 (last edited 2015-07-14 11:38:52 by aguilarf)