Outputs

Deliverables

D1.1 - Data Management Plan (DMP) - M6 

The Data Management Plan constitutes a requirement of the LIGATE project’s obligation to adhere to the European Commission’s Open Research Data Pilot which enables open access to research data where possible. LIGATE Data Management Plan (DMP) describes the data and program repositories which can be used by third parties for purposes such as data mining, exploitation or validation of the project’s results, for example. It describes data that will be acquired or produced during the project, how the data will be managed, annotated and stored, the standards to use and how the data will be handled and protected during and after the completion of the project. The Plan will be updated at regular intervals, in line with the project’s activities. Related documents include D1.1, “Requirements and Specifications and Integration Plan”. In general, he data and tools are or will be made freely available; the few exceptions involve licensed or proprietary software and details describing the restrictions on their use are given.

D1.2 – Requirements and Specifications and Integration Plan – M9 

D1.1 describe the requirements and specifications of the project’s software components together with the constraints imposed by the available hardware architectures. Related documents include the Data Management Plan (D1.2). It lists the hardware resources provided by HPC centres and other partners, including computer systems which will become available in the coming months. It details the specifications and requirements for the lead users of the LIGATE solution, DOMPE’ and TOFMOTION, followed by other components such as LIGEN docking engine, GROMACS molecular modelling software and the HyperQueue workflow engine,. It also provides a description of the integration plan and configuration management.

D1.3 - Initial Validation Result - M18 

Initial validation of the different solution components and of the solution as a whole. The validation tests have been defined for the following components:

  • LiGen. Reference datasets and scripts have been defined which allow the testing of the LiGen components separately or their integration in a LiGen workflow. Thus, it is possible to run a thorough validation of all the software or a quicker test to check for errors after a software update. To facilitate the validation of the software on a new system, all the test data and run scripts are maintained in a dedicated repository: https://gitlab.hpc.cineca.it/ligate/ligen-testbed.
  • HyperQueue. The software is shipped with a testing suite which can be rapidly built via Python scripts.
  • GROMACS. The program relies extensively on detailed testing. All work and changes are subject to unit testing (and new code is not accepted without unit tests) in a continuous integration (CI) fashion. In fact, the CI system is so efficient that it has more detected more than a dozen bugs which were due to compilers rather than the simulation code.
  • D2.1 – Application Code Accelerated with SYCL  - M9

    A document was produced describing the source code submitted along with D2.1 deliverable. The document describes the contents of the two codebases released for deliverable D2.1, the main challenges and adopted solutions in the development of the SYCL accelerated versions, code portability in terms of SYCL implementations and supported target architectures and introduces strategy for future update and maintaining of the codebases. More in detail, a SYCL implementation of LIGEN is being developed having started from ligen-geodock and ligen-score. Each CUDA kernel in CUDA implementation is mapped to a SYCL kernel in SYCL implementation.
    The document contributes to the deliverable “Ligate Software Release” released for the first milestone MS1 and follows the indications provided by the data plan (D1.2) and the requirements, specifications and integration plan (D1.1). 

    D2.2 – Application Code Accelerated with Celerity - M18 

    3 celerity porting options have been identified, the most promising being to use Celerity to distribute existing GPU workload per-node. However, during latest discussion in WP2 it turned out that based on Polimi's most recent work there might actually be a fourth hypotesis which is under review from UniSA, POLIMI and UIBK.

    D2.3 - Intermediate runtime and autotuning framework - M18

    This intermediate report describes the progress made on runtime system optimization, including autotuning, scheduling and data distribution improvements, as well as energy optimization.

    D3.1 – Specification of data/ API requirements - M9 

    D3.1 report contains the results of analysis of API needs for the modules to be properly integrated. Most interfaces concern data format and parameters. It describe LiGen modules, with overviews of each module’s goal, input/output and command line arguments. The interfaces of the LiGen modules have been updated to build flexible workflow, also including external tools. Most of the already existing modules of LiGen have been refactored to consider the new defined interfaces. For GROMACS work has focused on requirements for automatic topology and parameter generation, and automatically decide simulation length to reach a target precision. The document also shows examples of module composition for more complex workflows, including Virtual Screening, pre-processing, docking, scoring and free energy calculations. and considers a preliminary analysis for the HyperQueue tool to manage submission of the workflows. In summary, API needs have been analyzed and reviewed and interfaces for modules development have been defined, as well as settling on common input/output standard formats. Consistency between application concepts (e.g., data types) has been verified, and a plan for API evolution has been defined that will pave the way for subsequent D3.2 and D3.3 deliverables.

    D3.2 Data translators
    and code 

    Extensions to existing data translators when possible, and implementation of new ones either as scripts, standalone programs or in the original applications to allow exchange of data and interact within the solution.

    This work has been highly successful. While there will be additional efforts in the second half of the project to fully integrate with newly developed interfaces, in particular to better identify and handle broken, incorrect or simply unsupported molecules (for a particular parameter set the user has selected), the achievements here have allowed us to execute our first trial benchmark tests of large-scale free energy and docking workflows.

    D4.1 - Initial analysis on the machine learning module - M18

    Strategy for binding affinity prediction and pose selection was shared and requirements for successful integration with other virtual screening modules were collected. A new data-generation phase has just finished for the pose selection task that has produced significantly more poses to train the model. Model has been retrained with promising results. Pose selector: different strategies are being explored, e.g.: 3D Convolutional Neural Networks, Graph Neural Networks and Mixture Density Networks. 3 open source SW with brilliant results on CASF datasets are being tested and adapted to the use case at hand. Complex alignment has been granted within LiGen workflow, i.e.: no need to rely on specific methods, (such as Frobenius norm evaluation) to ensure rototranslation and permutation invariance wrt system of reference changes.

    D4.3 Intermediate solution for data management  - M18

    This deliverable reports architecture of the first version of the IO storage platform optimized for LIGATE use cases. CINECA is testing ETL and a basic ML pipeline (PCA) using RAPIDS and NVTabular using 3 trajectories (.xtc binaries) of size 1,5, and 10 G for testing. Currently, the framework does not seem stable, debugging ongoing both on CINECA side and with NVIDIA developers.

    D4.5 Workflow benchmark specification and initial results - M18

    HyperQueue framework for job management was presented. Efficient multi-GPU and multi-node execution of AI applications and frameworks on the GPU nodes of Karolina supercomputer was demonstrated. Under development an experimental API for GROMACS pipeline to make it more robust, maintainable and scalable. 

    D6.2 Initial Dissemination and Communication Plan

    This document provides the coordination of dissemination and communication activities (LIGATE webpage, workshops, conferences, articles, and book).
    It shows in detail the different steps, stages, messages and tools we are using to widely spread the progress and results of the project.

    Milestones

    MS2 - First integrated version of the CADD platform first version of the programming tools associated to the platform released.

    MS1 - SYCL version on LiGen running on GPUs is a critical achievement for MS1 and is necessary to compare software performances wrt its CUDA version. Since this result has not been attained yet, Partners agreed to postpone MS1 release, which nonetheless is an internal milestone and thus its delayed release has no impact on Project evaluation by the EC. 2 SYCL versions of LiGen modules (i.e.: LiGen Dock & LiGen Score) for a single molecule smoothly running on NVIDIA GPUs have been produced. Since neither PoliMI nor UniSa have AMD GPUs available, no tests were carried out with these hardware resources. Nonetheless, this second test is not a blocking issue and MS1 has been released.

    Keep Updated

    Subscribe to our newsletter

    Get Latest Updates!

    ADDRESS

    Ligate
    c/o Dompé farmaceutici
    Via Tommaso De Amicis, 95
    80145 Napoli, Italy

    This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 956137. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Italy, Sweden, Austria, Czech Republic, Switzerland.