DGX A100におけるMIGとジョブスケジューラUGEの動作検証

2020.10.07 リポート

DGX A100 in UGE with MIG

ジーデップ・アドバンスでは自社のDGX A100テストドライブ機を使ってMIG(Multi-instance GPU)と、Docker利用のGPUクラスタではメジャーなジョブスケジューラーであるUGE(Univa Grid Engine)の動作検証を国内でUGEで多数の導入事績を持つULGS株式会社の協力のもと行ってみました。


nvidia DGX-A100 を使って、UGE から、MIG 対応の Docker ジョブを流すことができるか早速検証です。 まずは、MIG instance の分割方法は大まかに考えて2種類あると思いますが、
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html

今回の検証では、GPU instanceを一番大きな MIG 7g.40gb で大きく切り、その中の SM だけ 14 Unitずつ分離するように Compute instance 7_1_slice を 7 つ作り、8 GPU すべてに適用すると、56 instance が作成されます。nvidia-sim -L を実行するとその device UUID が表示されるので、UGE にその UUID 情報を登録しておきます。具体的には下記のように、mig という RSMAP complex を作っておき、56 instance 分登録しておきます。

clouduser@dgxa100-01:~$ qconf -se dgxa100-01

hostname dgxa100-01

load_scaling NONE

complex_values gpu=8(0-7),m_mem_free=1031883.000000M, \

m_mem_free_n0=128810.625000M, \

m_mem_free_n1=129016.019531M, \

m_mem_free_n2=129016.019531M, \

m_mem_free_n3=128979.789062M, \

m_mem_free_n4=129016.019531M, \

m_mem_free_n5=129016.019531M, \

m_mem_free_n6=129016.019531M, \

m_mem_free_n7=129013.433594M, \

mig=56(MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/0 \

MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/1 \

MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/2 \

MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/3 \

MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/4 \

MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/5 \

MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/6 \

MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/0 \

MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/1 \

MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/2 \

MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/3 \

MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/4 \

MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/5 \

MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/6 \

MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/0 \

MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/1 \

MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/2 \

MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/3 \

MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/4 \

MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/5 \

MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/6 \

MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/0 \

MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/1 \

MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/2 \

MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/3 \

MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/4 \

MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/5 \

MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/6 \

MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/0 \

MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/1 \

MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/2 \

MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/3 \

MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/4 \

MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/5 \

MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/6 \

MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/0 \

MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/1 \

MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/2 \

MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/3 \

MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/4 \

MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/5 \

MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/6 \

MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/0 \

MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/1 \

MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/2 \

MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/3 \

MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/4 \

MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/5 \

MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/6 \

MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/0 \

MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/1 \

MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/2 \

MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/3 \

MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/4 \

MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/5 \

MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/6)

load_values arch=lx-amd64,cpu=0.000000,docker=1, \

docker_images=nvcr.io/nvidia/cuda:11.0-sample, \

tensorflow/tensorflow:latest-devel-gpu,gnmt_tf:latest, \

:,bert:latest,nvidia_ncf:latest, \

:,mlperf-inference-bert:latest, \

nvidia/cuda:11.0-base,nvcr.io/nvidia/cuda:10.2-devel, \

nvidia/cuda:11.0-devel,:, \

gdep-adv/test:11.0-1,nvcr.io/nvidia/cuda:11.0-devel, \

nvfw-dgxa100:20.05.12.3, \

nvcr.io/nvidia/tensorflow:20.07-tf1-py3, \

nvcr.io/nvidia/tritonserver:20.06-v1-py3, \

nvcr.io/nvidia/tensorflow:20.06-tf1-py3, \

nvcr.io/nvidia/tensorflow:19.10-py3, \

nvcr.io/nvidia/tensorrtserver:19.08-py3, \

nvcr.io/nvidia/k8s/cuda-sample:nbody,load_avg=4.940000, \

load_long=3.210000,load_medium=4.940000, \

load_short=1.150000,m_cache_l1=32.000000K, \

m_cache_l2=512.000000K,m_cache_l3=16384.000000K, \

m_core=128,m_gpu=1,m_mem_free=1020310.000000M, \

m_mem_total=1031883.000000M, \

m_mem_total_n0=128810.625000M, \

m_mem_total_n1=129016.019531M, \

m_mem_total_n2=129016.019531M, \

m_mem_total_n3=128979.789062M, \

m_mem_total_n4=129016.019531M, \

m_mem_total_n5=129016.019531M, \

m_mem_total_n6=129016.019531M, \

m_mem_total_n7=129013.433594M,m_mem_used=11573.000000M, \

m_numa_nodes=8,m_socket=2,m_thread=256, \

m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \

m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \

m_topology_numa=[SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT], \

mem_free=1023223.988281M,mem_total=1031883.945312M, \

mem_used=8659.957031M,np_load_avg=0.019297, \

np_load_long=0.012539,np_load_medium=0.019297, \

np_load_short=0.004492,num_proc=256,swap_free=0.000000M, \

swap_total=0.000000M,swap_used=0.000000M, \

virtual_free=1023223.988281M, \

virtual_total=1031883.945312M,virtual_used=8659.957031M

processors 256

user_lists NONE

xuser_lists NONE

projects NONE

xprojects NONE

usage_scaling NONE

report_variables NONE

license_constraints NONE

license_oversubscription NONE


そして、ジョブを流してみましょう。下記にある通り
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

MIG instance を docker container で取り扱う場合には、NVIDIA_VISIBLE_DEVICES に UUID を指定すればいいとのことなので、UGE の Complex に登録しておいて、重複しないようにその情報を取り出してもらおうと思います。では56ジョブを下記の要領で流してみましょう。

for i in `seq 0 55`

do

qsub -l mig=1,docker,docker_images="*nvcr.io/nvidia/cuda:11.0-sample*" -xd '--runtime=nvidia,-e NVIDIA_VISIBLE_DEVICES=${mig(0)}' ./test-bench.sh

done


スクリプトの中身はこのようになり、nbody の benchmark を実行するだけの内容です。

# cat test-bench.sh

#!/bin/bash

#$ -S /bin/bash

/usr/local/cuda/samples/bin/x86_64/linux/release/nbody --benchmark -numbodies=409600


ここで、nvcr.io/nvidia/cuda:11-0-sample というイメージは一般に存在するわけではありません。devel から cuda sameple の nbody を実行するために、sample をインストールして、ビルドしたイメージになります。これが実行されている状態で nvidia-smi を実行してみると、

clouduser@dgxa100-01:~$ nvidia-smi

Fri Sep 4 02:11:22 2020

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 A100-SXM4-40GB On | 00000000:07:00.0 Off | On |

| N/A 47C P0 243W / 400W | 990MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

| 1 A100-SXM4-40GB On | 00000000:0F:00.0 Off | On |

| N/A 45C P0 238W / 400W | 990MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

| 2 A100-SXM4-40GB On | 00000000:47:00.0 Off | On |

| N/A 38C P0 191W / 400W | 732MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

| 3 A100-SXM4-40GB On | 00000000:4E:00.0 Off | On |

| N/A 28C P0 43W / 400W | 87MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

| 4 A100-SXM4-40GB On | 00000000:87:00.0 Off | On |

| N/A 32C P0 43W / 400W | 11MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

| 5 A100-SXM4-40GB On | 00000000:90:00.0 Off | On |

| N/A 31C P0 45W / 400W | 0MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

| 6 A100-SXM4-40GB On | 00000000:B7:00.0 Off | On |

| N/A 31C P0 42W / 400W | 0MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

| 7 A100-SXM4-40GB On | 00000000:BD:00.0 Off | On |

| N/A 31C P0 45W / 400W | 0MiB / 40537MiB | N/A Default |

| | | Enabled |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| MIG devices: |

+------------------+----------------------+-----------+-----------------------+

| GPU GI CI MIG | Memory-Usage | Vol| Shared |

| ID ID Dev | | SM Unc| CE ENC DEC OFA JPG|

| | | ECC| |

|==================+======================+===========+=======================|

| 0 0 0 0 | 990MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 0 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 0 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 0 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 0 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 0 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 0 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

| 1 0 0 0 | 990MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 1 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 1 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 1 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 1 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 1 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 1 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

| 2 0 0 0 | 990MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 2 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 2 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 2 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 2 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 2 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 2 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

| 3 0 0 0 | 990MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 3 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 3 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 3 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 3 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 3 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 3 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

| 4 0 0 0 | 522MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 4 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 4 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 4 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 4 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 4 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 4 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

| 5 0 0 0 | 399MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 5 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 5 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 5 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 5 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 5 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 5 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

| 6 0 0 0 | 39MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 6 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 6 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 6 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 6 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 6 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 6 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

| 7 0 0 0 | 73MiB / 40537MiB | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 7 0 1 1 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 7 0 2 2 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 7 0 3 3 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 7 0 4 4 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 7 0 5 5 | | 14 0 | 7 0 5 1 1 |

+------------------+ +-----------+-----------------------+

| 7 0 6 6 | | 14 0 | 7 0 5 1 1 |

+------------------+----------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| 0 0 0 51971 C ...86_64/linux/release/nbody 141MiB |

| 0 0 1 52053 C ...86_64/linux/release/nbody 141MiB |

| 0 0 2 52050 C ...86_64/linux/release/nbody 141MiB |

| 0 0 3 52127 C ...86_64/linux/release/nbody 141MiB |

| 0 0 4 52742 C ...86_64/linux/release/nbody 141MiB |

| 0 0 5 52749 C ...86_64/linux/release/nbody 141MiB |

| 0 0 6 53150 C ...86_64/linux/release/nbody 141MiB |

| 1 0 0 52745 C ...86_64/linux/release/nbody 141MiB |

| 1 0 1 53147 C ...86_64/linux/release/nbody 141MiB |

| 1 0 2 54211 C ...86_64/linux/release/nbody 141MiB |

| 1 0 3 54214 C ...86_64/linux/release/nbody 141MiB |

| 1 0 4 55582 C ...86_64/linux/release/nbody 141MiB |

| 1 0 5 55575 C ...86_64/linux/release/nbody 141MiB |

| 1 0 6 55654 C ...86_64/linux/release/nbody 141MiB |

| 2 0 0 55661 C ...86_64/linux/release/nbody 141MiB |

| 2 0 1 55663 C ...86_64/linux/release/nbody 141MiB |

| 2 0 2 55813 C ...86_64/linux/release/nbody 141MiB |

| 2 0 3 55927 C ...86_64/linux/release/nbody 141MiB |

| 2 0 4 55894 C ...86_64/linux/release/nbody 141MiB |

| 2 0 5 55969 C ...86_64/linux/release/nbody 141MiB |

| 2 0 6 55972 C ...86_64/linux/release/nbody 141MiB |

| 3 0 0 55975 C ...86_64/linux/release/nbody 141MiB |

| 3 0 1 56048 C ...86_64/linux/release/nbody 141MiB |

| 3 0 2 56055 C ...86_64/linux/release/nbody 141MiB |

| 3 0 3 56061 C ...86_64/linux/release/nbody 141MiB |

| 3 0 4 56058 C ...86_64/linux/release/nbody 141MiB |

| 3 0 5 56064 C ...86_64/linux/release/nbody 141MiB |

| 3 0 6 56145 C ...86_64/linux/release/nbody 141MiB |

| 4 0 0 56172 C ...86_64/linux/release/nbody 141MiB |

| 4 0 1 56168 C ...86_64/linux/release/nbody 141MiB |

| 4 0 2 56197 C ...86_64/linux/release/nbody 141MiB |

| 4 0 3 56209 C ...86_64/linux/release/nbody 141MiB |

| 4 0 4 56352 C ...86_64/linux/release/nbody 141MiB |

| 4 0 5 56293 C ...86_64/linux/release/nbody 141MiB |

| 4 0 6 56359 C ...86_64/linux/release/nbody 141MiB |

| 5 0 0 56364 C ...86_64/linux/release/nbody 141MiB |

| 5 0 1 56355 C ...86_64/linux/release/nbody 141MiB |

| 5 0 2 56398 C ...86_64/linux/release/nbody 141MiB |

| 5 0 3 56373 C ...86_64/linux/release/nbody 141MiB |

| 5 0 4 56404 C ...86_64/linux/release/nbody 141MiB |

| 5 0 5 56401 C ...86_64/linux/release/nbody 141MiB |

| 5 0 6 56455 C ...86_64/linux/release/nbody 141MiB |

| 6 0 0 56661 C ...86_64/linux/release/nbody 141MiB |

| 6 0 1 56685 C ...86_64/linux/release/nbody 141MiB |

| 6 0 2 56605 C ...86_64/linux/release/nbody 141MiB |

| 6 0 3 56690 C ...86_64/linux/release/nbody 141MiB |

| 6 0 4 56707 C ...86_64/linux/release/nbody 99MiB |

| 6 0 5 56687 C ...86_64/linux/release/nbody 141MiB |

| 6 0 6 56665 C ...86_64/linux/release/nbody 141MiB |

| 7 0 0 56701 C ...86_64/linux/release/nbody 141MiB |

| 7 0 1 56679 C ...86_64/linux/release/nbody 141MiB |

| 7 0 2 56704 C ...86_64/linux/release/nbody 141MiB |

| 7 0 3 56693 C ...86_64/linux/release/nbody 141MiB |

| 7 0 4 56715 C ...86_64/linux/release/nbody 99MiB |

| 7 0 5 56722 C ...86_64/linux/release/nbody 99MiB |

| 7 0 6 56712 C ...86_64/linux/release/nbody 141MiB |

+-----------------------------------------------------------------------------+


きちんと、56 個に分離された MIG instance で nbody が実行されていることがわかります。ジョブの状態は qstat で見ると ばらばらに 56 ジョブ流れていることがわかります。

clouduser@dgxa100-01:~$ qstat

job-ID prior name user state submit/start at queue jclass slots ja-task-ID

------------------------------------------------------------------------------------------------------------------------------------------------

139 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

140 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

141 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

142 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

143 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

144 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

145 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

146 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

147 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

148 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

149 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

150 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

151 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

152 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

153 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

154 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

155 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

156 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

157 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

158 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

159 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

160 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

161 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

162 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

163 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

164 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

165 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

166 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

167 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

168 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

169 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

170 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

171 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

172 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

173 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

174 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

175 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

176 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

177 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

178 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

179 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

180 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

181 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

182 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

183 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

184 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

185 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

186 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

187 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

188 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

189 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

190 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1

191 0.55500 test-bench clouduser r 09/04/2020 02:11:07 all.q@dgxa100-01 1


例えば、job_id 190 の結果

cat test-bench.sh.o190

Run "nbody -benchmark [-numbodies=]" to measure performance.

-fullscreen (run n-body simulation in fullscreen mode)

-fp64 (use double precision floating point values for simulation)

-hostmem (stores simulation data in host memory)

-benchmark (run benchmark to measure performance)

-numbodies= (number of bodies (>= 1) to run in simulation)

-device= (where d=0,1,2.... for the CUDA device to use)

-numdevices= (where i=(number of CUDA devices > 0) to use for simulation)

-compare (compares simulation results running once on the default GPU and once on the CPU)

-cpu (run n-body simulation on the CPU)

-tipsy= (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode

> Simulation data stored in video memory

> Single precision floating point simulation

> 1 Devices used for simulation

GPU Device 0: "Ampere" with compute capability 8.0

> Compute 8.0 CUDA device: [A100-SXM4-40GB MIG 1c.7g.40gb]

number of bodies = 409600

409600 bodies, total time for 10 iterations: 18523.258 ms

= 90.574 billion interactions per second

= 1811.476 single-precision GFLOP/s at 20 flops per interaction


実行終了した結果を見ると、1GPU に対して 1/7 - 1/8 程度の値になっており、nbodyぐらいの処理であれば、分割しても問題なく同時実行できることがわかりました。

今回の検証は、UGE 8.6.14 で行っていますが、このバージョンはまだ docker --gpu という指定に対応していないので、--runtime=nvidia を指定し、nvidia-docker v2 基準で実装しています。ただ、nvidia-docker v2 つまり --runtime=nvidia という指定は、今後なくなるといわれていましたが、結局、ほかの OCI に対応する関係上生き残っていくようです。

いくつか問題点も確認できていますが、すでに、Univa 社に問題点は報告済みとなり、修正版が出てくればそれらの問題は解決されます。現状でも、MIG Instance においても、Docker container を使用することで、互いに重複することなく利用できることは示すことができました。今回のテスト例では、簡単のために、一様な Compute Instance を使っていますが、GPU Instance また Compute Instance に異なる Instance Profile を使用しても問題なく動作することも確認できています。

今回の評価では、2つ(proc と device)の nvidia-capability でMIG instance がどのように働くか迄は確かめていませんので、次回はそれらがどう働くかを見据えて、UGE による、MIG Instance の制御を追加確認したいと思います。


本検証では国内初のDGX A100テストドライブ機を利用しています。是非ご自身でDGX A100の性能を確かめたい。実データを使った検証を行ってみたいというお客様はお気軽にジーデップ・アドバンスへご相談ください。


NVIDIA DGX A100 製品ページ
NVIDIA DGX A100 フライヤー
NVIDIA DGX A100 TRY & BUY キャンペーン
NVIDIA DGX A100 特設動画「開封の儀」
NVIDIA A100 まとめ特設ページ
TOPへ