Outputs

Deliverables

D1.1 - Data Management Plan (DMP) - M6 

The Data Management Plan constitutes a requirement of the LIGATE project’s obligation to adhere to the European Commission’s Open Research Data Pilot which enables open access to research data where possible. LIGATE Data Management Plan (DMP) describes the data and program repositories which can be used by third parties for purposes such as data mining, exploitation or validation of the project’s results, for example. It describes data that will be acquired or produced during the project, how the data will be managed, annotated and stored, the standards to use and how the data will be handled and protected during and after the completion of the project. The Plan will be updated at regular intervals, in line with the project’s activities. Related documents include D1.1, “Requirements and Specifications and Integration Plan”. In general, he data and tools are or will be made freely available; the few exceptions involve licensed or proprietary software and details describing the restrictions on their use are given.

D1.2 – Requirements and Specifications and Integration Plan – M9 

D1.1 describe the requirements and specifications of the project’s software components together with the constraints imposed by the available hardware architectures. Related documents include the Data Management Plan (D1.2). It lists the hardware resources provided by HPC centres and other partners, including computer systems which will become available in the coming months. It details the specifications and requirements for the lead users of the LIGATE solution, DOMPE’ and TOFMOTION, followed by other components such as LIGEN docking engine, GROMACS molecular modelling software and the HyperQueue workflow engine,. It also provides a description of the integration plan and configuration management.

D1.3 - Initial Validation Result - M18 


D2.1 – Application Code Accelerated with SYCL  - M9

A document was produced describing the source code submitted along with D2.1 deliverable. The document describes the contents of the two codebases released for deliverable D2.1, the main challenges and adopted solutions in the development of the SYCL accelerated versions, code portability in terms of SYCL implementations and supported target architectures and introduces strategy for future update and maintaining of the codebases. More in detail, a SYCL implementation of LIGEN is being developed having started from ligen-geodock and ligen-score. Each CUDA kernel in CUDA implementation is mapped to a SYCL kernel in SYCL implementation.
The document contributes to the deliverable “Ligate Software Release” released for the first milestone MS1 and follows the indications provided by the data plan (D1.2) and the requirements, specifications and integration plan (D1.1). 

D2.2 – Application Code Accelerated with Celerity - M18 

3 celerity porting options have been identified, the most promising being to use Celerity to distribute existing GPU workload per-node. However, during latest discussion in WP2 it turned out that based on Polimi's most recent work there might actually be a fourth hypotesis which is under review from UniSA, POLIMI and UIBK.

D2.3 - Intermediate runtime and autotuning framework - M18 

D3.1 – Specification of data/ API requirements - M9 

D3.1 report contains the results of analysis of API needs for the modules to be properly integrated. Most interfaces concern data format and parameters. It describe LiGen modules, with overviews of each module’s goal, input/output and command line arguments. The interfaces of the LiGen modules have been updated to build flexible workflow, also including external tools. Most of the already existing modules of LiGen have been refactored to consider the new defined interfaces. For GROMACS work has focused on requirements for automatic topology and parameter generation, and automatically decide simulation length to reach a target precision. The document also shows examples of module composition for more complex workflows, including Virtual Screening, pre-processing, docking, scoring and free energy calculations. and considers a preliminary analysis for the HyperQueue tool to manage submission of the workflows. In summary, API needs have been analyzed and reviewed and interfaces for modules development have been defined, as well as settling on common input/output standard formats. Consistency between application concepts (e.g., data types) has been verified, and a plan for API evolution has been defined that will pave the way for subsequent D3.2 and D3.3 deliverables.

D3.2 Data translators
and code
 

D4.1 - Initial analysis on the machine learning module - M18

Strategy for binding affinity prediction and pose selection was shared and requirements for successful integration with other virtual screening modules were collected. A new data-generation phase has just finished for the pose selection task that has produced significantly more poses to train the model. Model has been retrained with promising results. Pose selector: different strategies are being explored, e.g.: 3D Convolutional Neural Networks, Graph Neural Networks and Mixture Density Networks. 3 open source SW with brilliant results on CASF datasets are being tested and adapted to the use case at hand. Complex alignment has been granted within LiGen workflow, i.e.: no need to rely on specific methods, (such as Frobenius norm evaluation) to ensure rototranslation and permutation invariance wrt system of reference changes.

D4.3 Intermediate solution for data management  - M18

This deliverable reports architecture of the first version of the IO storage platform optimized for LIGATE use cases. CINECA is testing ETL and a basic ML pipeline (PCA) using RAPIDS and NVTabular using 3 trajectories (.xtc binaries) of size 1,5, and 10 G for testing. Currently, the framework does not seem stable, debugging ongoing both on CINECA side and with NVIDIA developers.

D4.5 Workflow benchmark specification and initial results - M18

HyperQueue framework for job management was presented. Efficient multi-GPU and multi-node execution of AI applications and frameworks on the GPU nodes of Karolina supercomputer was demonstrated. Under development an experimental API for GROMACS pipeline to make it more robust, maintainable and scalable. 

Milestones

for a single molecule smoothly running on NVIDIA GPUs have been produced. Since neither PoliMI nor UniSa have AMD GPUs available, no tests were carried out with these hardware resources. Nonetheless, this second test is not a blocking issue and MS1 has been released.

MS1 - SYCL version on LiGen running on GPUs is a critical achievement for MS1 and is necessary to compare software performances wrt its CUDA version. Since this result has not been attained yet, Partners agreed to postpone MS1 release, which nonetheless is an internal milestone and thus its delayed release has no impact on Project evaluation by the EC. 2 SYCL versions of LiGen modules (i.e.: LiGen Dock & LiGen Score)

Keep Updated

Subscribe to our newsletter

Get Latest Updates!

ADDRESS

Ligate
c/o Dompé Farmaceutici
Via Pietro Castellino, 111
80131 Napoli, Italy

This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 956137. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Italy, Sweden, Austria, Czech Republic, Switzerland.