Alphafold

Alphafold 3 on 集群_Sugon

Github: https://github.com/google-deepmind/alphafold3

程序路径：/public/apps/alphafold3/alphafold3 已更新至版本：3.0.1
数据库路径：/public/shared/alphafold3 （位于共享存储，所有节点可见，作为 fallback）
模型路径：/public/shared/alphafold3/models
计算节点本地数据库路径：/ssd/alphafold3 （仅限 AMD/A40/RTX3090 队列）

使用 Apptainer (singularity) 容器运行：

cd /public/apps/alphafold3/alphafold3
apptainer exec \
    --nv \
    --bind <YOUR_INPUT_PATH>:/root/af_input \
    --bind <YOUR_OUTPUT_PATH>:/root/af_output \
    --bind <MODEL_PATH>:/root/models \
    --bind <SSD_DATABASE_PATH>:/root/public_databases \
    --bind <FALLBACK_DATABASE_PATH>:/root/public_databases_fallback \
    alphafold3.0.1.sif \
    python run_alphafold.py \
    --json_path=/root/af_input/<YOUR_INPUT_JSON_FILE> \
    --model_dir=/root/models \
    --db_dir=/root/public_databases \
    --db_dir=/root/public_databases_fallback \
    --output_dir=/root/af_output

由于 MSA 比对搜索不需要使用 GPU，建议将 2 步分开运行，分别使用 --run_inference=False 和 --run_data_pipeline=False 禁用相应步骤。

搜索脚本举例：

#!/bin/bash
#SBATCH --job-name=af3msa
#SBATCH --output=af3_out_%j.txt
#SBATCH --error=af3_err_%j.txt
#SBATCH --nodes=1
#SBATCH --partition=AMD
#SBATCH --cpus-per-task=8

export projectDir=$HOME/alphafold3/test
export af3Bin=/public/apps/alphafold3/alphafold3
export af3Database=/public/shared/alphafold3
export af3Ssd=/ssd/alphafold3

cd $af3Bin
apptainer exec \
    --bind $projectDir:/root/af_input \
    --bind $projectDir:/root/af_output \
    --bind $af3Database/models:/root/models \
    --bind $af3Ssd:/root/public_databases \
    --bind $af3Database:/root/public_databases_fallback \
    alphafold3.0.1.sif \
    python run_alphafold.py \
    --json_path=/root/af_input/input.json \
    --model_dir=/root/models \
    --db_dir=/root/public_databases \
    --db_dir=/root/public_databases_fallback \
    --output_dir=/root/af_output \
    --run_inference=False \

推理脚本举例：

#!/bin/bash
#SBATCH --job-name=af3infer
#SBATCH --output=af3_out_%j.txt
#SBATCH --error=af3_err_%j.txt
#SBATCH --nodes=1
#SBATCH --partition=a40
#SBATCH --cpus-per-task=8

export projectDir=$HOME/alphafold3/test
export af3Bin=/public/apps/alphafold3/alphafold3
export af3Database=/public/shared/alphafold3
export af3Ssd=/ssd/alphafold3

cd $af3Bin
apptainer exec \
    --nv \
    --bind $projectDir:/root/af_input \
    --bind $projectDir:/root/af_output \
    --bind $af3Database/models:/root/models \
    --bind $af3Ssd:/root/public_databases \
    --bind $af3Database:/root/public_databases_fallback \
    alphafold3.0.1.sif \
    python run_alphafold.py \
    --json_path=/root/af_input/input.json \
    --model_dir=/root/models \
    --db_dir=/root/public_databases \
    --db_dir=/root/public_databases_fallback \
    --output_dir=/root/af_output \
    --run_data_pipeline=False \

备注

Benchmark
使用 8 核运行 MSA 全库搜索大约耗时 1000s，更多核对于搜索速度没有明显提升。
使用 A40 单卡推理大约耗时 120s

Alphafold 2 on 集群_IBM

完整运行环境部署于baode03节点和ml01节点，目前版本为 AlphaFold-multimer (v2.3.2)，pdb数据库更新至2023-07-06。 Non_docker源代码参见： https://github.com/kalininalab/alphafold_non_docker

baode03 节点用于公用任务运行，程序及数据库路径：

# 模型路径
/data02/apps/alphafold-2.3.2

# 数据库路径
/ssd/database/alphafold

请通过任务系统提交计算，脚本参考如下，请根据版本注释相应行：

#!/bin/bash
#
#PBS -l nodes=baode03:ppn=20
#PBS -l walltime=24:00:00
#PBS -N alphafold2
#PBS -q gpuw
#PBS -k oe
#PBS -V

param_database=/ssd/database/alphafold

# 加载alphafold环境
cd /data02/apps/alphafold-2.3.2
export PATH=/data02/apps/miniconda3/envs/alphafold-2.3.2/bin:$PATH

# 计算单体蛋白质结构
bash run_alphafold.sh -d $param_database -o /path/to/your/output -f /path/to/your/sequence.fasta -t 2023-07-06

# 计算蛋白质复合物结构，sequences.fasta包含多条肽链
bash run_alphafold.sh -d $param_database -o /path/to/your/output -f /path/to/your/sequences.fasta -t 2023-07-06 -m multimer

run_alphafold.sh 运行参考：

Usage: run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir>         Path to directory of supporting data
-o <output_dir>       Path to a directory that will store the results.
-f <fasta_paths>      Path to FASTA files containing sequences. If a FASTA file contains multiple sequences, then it will be folded as a multimer. To fold more sequences one after another, write the files separated by a comma
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu>          Enable NVIDIA runtime to run with GPUs (default: true)
-r <run_relax>        Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting stereochemical violations but might help in case you are having issues with the relaxation stage (default: true)
-e <enable_gpu_relax> Run relax on GPU if GPU is enabled (default: true)
-n <openmm_threads>   OpenMM threads (default: all available cores)
-a <gpu_devices>      Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer')
-c <db_preset>        Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there are 5 models then there will be 10 predictions per input. Note: this FLAG only applies if model_preset=multimer (default: 5)
-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')

ml01节点用于脚本调用，环境配置、模型及数据库路径：

# 模型路径
/local-data/apps/alphafold-2.3.2

# 数据库路径 (ssd raid 0 cache)
/public/database/alphafold-2.3.2

# 运行示例
ssh ml01
export PATH=/local-data/apps/miniconda3/envs/alphafold-2.3.2/bin:$PATH
cd /local-data/apps/alphafold-2.3.2
bash run_alphafold.sh -d alphafold_data/ -o dummy_test/ -f multimer_query.fasta -t 2023-07-06 -m multimer