DGX A100におけるMIGとジョブスケジューラUGEの動作検証

2020.10.06 GDEP Labs

ジーデップ・アドバンスでは自社のDGX A100テストドライブ機を使ってMIG(Multi-instance GPU)と、Docker利用のGPUクラスタではメジャーなジョブスケジューラーであるUGE(Univa Grid Engine)の動作検証を国内でUGEで多数の導入事績を持つULGS株式会社の協力のもと行ってみました。


nvidia DGX-A100 を使って、UGE から、MIG 対応の Docker ジョブを流すことができるか早速検証です。まずは、MIG instance の分割方法は大まかに考えて2種類あると思いますが、https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html

今回の検証では、GPU instanceを一番大きな MIG 7g.40gb で大きく切り、その中の SM だけ 14 Unitずつ分離するように、Compute instance 7_1_slice を 7 つ作り、8 GPU すべてに適用すると、56 instance が作成されます。nvidia-sim -L を実行するとその device UUID が表示されるので、UGE にその UUID 情報を登録しておきます。具体的には下記のように、mig という RSMAP complex を作っておき、56 instance 分登録しておきます。

clouduser@dgxa100-01:~$ qconf -se dgxa100-01
hostname              dgxa100-01
load_scaling          NONE
complex_values        gpu=8(0-7),m_mem_free=1031883.000000M, \
                      m_mem_free_n0=128810.625000M, \
                      m_mem_free_n1=129016.019531M, \
                      m_mem_free_n2=129016.019531M, \
                      m_mem_free_n3=128979.789062M, \
                      m_mem_free_n4=129016.019531M, \
                      m_mem_free_n5=129016.019531M, \
                      m_mem_free_n6=129016.019531M, \
                      m_mem_free_n7=129013.433594M, \
                      mig=56(MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/0 \
                      MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/1 \
                      MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/2 \
                      MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/3 \
                      MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/4 \
                      MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/5 \
                      MIG-GPU-f575246e-2439-4413-8435-bd71f7135f55/0/6 \
                      MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/0 \
                      MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/1 \
                      MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/2 \
                      MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/3 \
                      MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/4 \
                      MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/5 \
                      MIG-GPU-eb023a54-f73b-2b08-8ea4-21df1508ea57/0/6 \
                      MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/0 \
                      MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/1 \
                      MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/2 \
                      MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/3 \
                      MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/4 \
                      MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/5 \
                      MIG-GPU-06d5597b-7d40-bfa8-cb7a-989bb109133c/0/6 \
                      MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/0 \
                      MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/1 \
                      MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/2 \
                      MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/3 \
                      MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/4 \
                      MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/5 \
                      MIG-GPU-f3f5c871-abbb-1df5-47bc-c2ac3f1f7bbd/0/6 \
                      MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/0 \
                      MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/1 \
                      MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/2 \
                      MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/3 \
                      MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/4 \
                      MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/5 \
                      MIG-GPU-a14ff863-0177-ef0b-3465-c96e4a27f492/0/6 \
                      MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/0 \
                      MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/1 \
                      MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/2 \
                      MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/3 \
                      MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/4 \
                      MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/5 \
                      MIG-GPU-f99e54f2-4a96-17d1-b901-6826d11feb3e/0/6 \
                      MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/0 \
                      MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/1 \
                      MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/2 \
                      MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/3 \
                      MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/4 \
                      MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/5 \
                      MIG-GPU-068ae205-36e0-5747-8bdd-8bb6436faf48/0/6 \
                      MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/0 \
                      MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/1 \
                      MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/2 \
                      MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/3 \
                      MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/4 \
                      MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/5 \
                      MIG-GPU-6b00b997-2df2-a4be-2eb0-04a9e409a859/0/6)
load_values           arch=lx-amd64,cpu=0.000000,docker=1, \
                      docker_images=nvcr.io/nvidia/cuda:11.0-sample, \
                      tensorflow/tensorflow:latest-devel-gpu,gnmt_tf:latest, \
                      <none>:<none>,bert:latest,nvidia_ncf:latest, \
                      <none>:<none>,mlperf-inference-bert:latest, \
                      nvidia/cuda:11.0-base,nvcr.io/nvidia/cuda:10.2-devel, \
                      nvidia/cuda:11.0-devel,<none>:<none>, \
                      gdep-adv/test:11.0-1,nvcr.io/nvidia/cuda:11.0-devel, \
                      nvfw-dgxa100:20.05.12.3, \
                      nvcr.io/nvidia/tensorflow:20.07-tf1-py3, \
                      nvcr.io/nvidia/tritonserver:20.06-v1-py3, \
                      nvcr.io/nvidia/tensorflow:20.06-tf1-py3, \
                      nvcr.io/nvidia/tensorflow:19.10-py3, \
                      nvcr.io/nvidia/tensorrtserver:19.08-py3, \
                      nvcr.io/nvidia/k8s/cuda-sample:nbody,load_avg=4.940000, \
                      load_long=3.210000,load_medium=4.940000, \
                      load_short=1.150000,m_cache_l1=32.000000K, \
                      m_cache_l2=512.000000K,m_cache_l3=16384.000000K, \
                      m_core=128,m_gpu=1,m_mem_free=1020310.000000M, \
                      m_mem_total=1031883.000000M, \
                      m_mem_total_n0=128810.625000M, \
                      m_mem_total_n1=129016.019531M, \
                      m_mem_total_n2=129016.019531M, \
                      m_mem_total_n3=128979.789062M, \
                      m_mem_total_n4=129016.019531M, \
                      m_mem_total_n5=129016.019531M, \
                      m_mem_total_n6=129016.019531M, \
                      m_mem_total_n7=129013.433594M,m_mem_used=11573.000000M, \
                      m_numa_nodes=8,m_socket=2,m_thread=256, \
                      m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \
                      m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT, \
                      m_topology_numa=[SCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT], \
                      mem_free=1023223.988281M,mem_total=1031883.945312M, \
                      mem_used=8659.957031M,np_load_avg=0.019297, \
                      np_load_long=0.012539,np_load_medium=0.019297, \
                      np_load_short=0.004492,num_proc=256,swap_free=0.000000M, \
                      swap_total=0.000000M,swap_used=0.000000M, \
                      virtual_free=1023223.988281M, \
                      virtual_total=1031883.945312M,virtual_used=8659.957031M
processors            256
user_lists            NONE
xuser_lists           NONE
projects              NONE
xprojects             NONE
usage_scaling         NONE
report_variables      NONE
license_constraints   NONE
license_oversubscription NONE

そして、ジョブを流してみましょう。下記にある通り

https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

MIG instance を docker container で取り扱う場合には、NVIDIA_VISIBLE_DEVICES に UUID を指定すればいいとのことなので、UGE の Complex に登録しておいて、重複しないようにその情報を取り出してもらおうと思います。では56ジョブを下記の要領で流してみましょう。

for i in `seq 0 55`
do 
qsub -l mig=1,docker,docker_images="*nvcr.io/nvidia/cuda:11.0-sample*" -xd '--runtime=nvidia,-e NVIDIA_VISIBLE_DEVICES=${mig(0)}' ./test-bench.sh
done

 

スクリプトの中身はこのようになり、nbody の benchmark を実行するだけの内容です。

# cat test-bench.sh
#!/bin/bash

#$ -S /bin/bash
/usr/local/cuda/samples/bin/x86_64/linux/release/nbody --benchmark -numbodies=409600

 

ここで、nvcr.io/nvidia/cuda:11-0-sample というイメージは一般に存在するわけではありません。devel から cuda sameple の nbody を実行するために、sample をインストールして、ビルドしたイメージになります。これが実行されている状態で nvidia-smi を実行してみると、

clouduser@dgxa100-01:~$ nvidia-smi
Fri Sep  4 02:11:22 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:07:00.0 Off |                   On |
| N/A   47C    P0   243W / 400W |    990MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   1  A100-SXM4-40GB      On   | 00000000:0F:00.0 Off |                   On |
| N/A   45C    P0   238W / 400W |    990MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   2  A100-SXM4-40GB      On   | 00000000:47:00.0 Off |                   On |
| N/A   38C    P0   191W / 400W |    732MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   3  A100-SXM4-40GB      On   | 00000000:4E:00.0 Off |                   On |
| N/A   28C    P0    43W / 400W |     87MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   4  A100-SXM4-40GB      On   | 00000000:87:00.0 Off |                   On |
| N/A   32C    P0    43W / 400W |     11MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   5  A100-SXM4-40GB      On   | 00000000:90:00.0 Off |                   On |
| N/A   31C    P0    45W / 400W |      0MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   6  A100-SXM4-40GB      On   | 00000000:B7:00.0 Off |                   On |
| N/A   31C    P0    42W / 400W |      0MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   7  A100-SXM4-40GB      On   | 00000000:BD:00.0 Off |                   On |
| N/A   31C    P0    45W / 400W |      0MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |                      | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    0   0   0  |    990MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  0    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  0    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  0    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  0    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  0    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  0    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+
|  1    0   0   0  |    990MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  1    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  1    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  1    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  1    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  1    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  1    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+
|  2    0   0   0  |    990MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  2    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  2    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  2    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  2    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  2    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  2    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+
|  3    0   0   0  |    990MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  3    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  3    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  3    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  3    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  3    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  3    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+
|  4    0   0   0  |    522MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  4    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  4    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  4    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  4    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  4    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  4    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+
|  5    0   0   0  |    399MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  5    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  5    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  5    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  5    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  5    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  5    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+
|  6    0   0   0  |     39MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  6    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  6    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  6    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  6    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  6    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  6    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+
|  7    0   0   0  |     73MiB / 40537MiB | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  7    0   1   1  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  7    0   2   2  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  7    0   3   3  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  7    0   4   4  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  7    0   5   5  |                      | 14      0 |  7   0    5    1    1 |
+------------------+                      +-----------+-----------------------+
|  7    0   6   6  |                      | 14      0 |  7   0    5    1    1 |
+------------------+----------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0    0    0      51971      C   ...86_64/linux/release/nbody      141MiB |
|    0    0    1      52053      C   ...86_64/linux/release/nbody      141MiB |
|    0    0    2      52050      C   ...86_64/linux/release/nbody      141MiB |
|    0    0    3      52127      C   ...86_64/linux/release/nbody      141MiB |
|    0    0    4      52742      C   ...86_64/linux/release/nbody      141MiB |
|    0    0    5      52749      C   ...86_64/linux/release/nbody      141MiB |
|    0    0    6      53150      C   ...86_64/linux/release/nbody      141MiB |
|    1    0    0      52745      C   ...86_64/linux/release/nbody      141MiB |
|    1    0    1      53147      C   ...86_64/linux/release/nbody      141MiB |
|    1    0    2      54211      C   ...86_64/linux/release/nbody      141MiB |
|    1    0    3      54214      C   ...86_64/linux/release/nbody      141MiB |
|    1    0    4      55582      C   ...86_64/linux/release/nbody      141MiB |
|    1    0    5      55575      C   ...86_64/linux/release/nbody      141MiB |
|    1    0    6      55654      C   ...86_64/linux/release/nbody      141MiB |
|    2    0    0      55661      C   ...86_64/linux/release/nbody      141MiB |
|    2    0    1      55663      C   ...86_64/linux/release/nbody      141MiB |
|    2    0    2      55813      C   ...86_64/linux/release/nbody      141MiB |
|    2    0    3      55927      C   ...86_64/linux/release/nbody      141MiB |
|    2    0    4      55894      C   ...86_64/linux/release/nbody      141MiB |
|    2    0    5      55969      C   ...86_64/linux/release/nbody      141MiB |
|    2    0    6      55972      C   ...86_64/linux/release/nbody      141MiB |
|    3    0    0      55975      C   ...86_64/linux/release/nbody      141MiB |
|    3    0    1      56048      C   ...86_64/linux/release/nbody      141MiB |
|    3    0    2      56055      C   ...86_64/linux/release/nbody      141MiB |
|    3    0    3      56061      C   ...86_64/linux/release/nbody      141MiB |
|    3    0    4      56058      C   ...86_64/linux/release/nbody      141MiB |
|    3    0    5      56064      C   ...86_64/linux/release/nbody      141MiB |
|    3    0    6      56145      C   ...86_64/linux/release/nbody      141MiB |
|    4    0    0      56172      C   ...86_64/linux/release/nbody      141MiB |
|    4    0    1      56168      C   ...86_64/linux/release/nbody      141MiB |
|    4    0    2      56197      C   ...86_64/linux/release/nbody      141MiB |
|    4    0    3      56209      C   ...86_64/linux/release/nbody      141MiB |
|    4    0    4      56352      C   ...86_64/linux/release/nbody      141MiB |
|    4    0    5      56293      C   ...86_64/linux/release/nbody      141MiB |
|    4    0    6      56359      C   ...86_64/linux/release/nbody      141MiB |
|    5    0    0      56364      C   ...86_64/linux/release/nbody      141MiB |
|    5    0    1      56355      C   ...86_64/linux/release/nbody      141MiB |
|    5    0    2      56398      C   ...86_64/linux/release/nbody      141MiB |
|    5    0    3      56373      C   ...86_64/linux/release/nbody      141MiB |
|    5    0    4      56404      C   ...86_64/linux/release/nbody      141MiB |
|    5    0    5      56401      C   ...86_64/linux/release/nbody      141MiB |
|    5    0    6      56455      C   ...86_64/linux/release/nbody      141MiB |
|    6    0    0      56661      C   ...86_64/linux/release/nbody      141MiB |
|    6    0    1      56685      C   ...86_64/linux/release/nbody      141MiB |
|    6    0    2      56605      C   ...86_64/linux/release/nbody      141MiB |
|    6    0    3      56690      C   ...86_64/linux/release/nbody      141MiB |
|    6    0    4      56707      C   ...86_64/linux/release/nbody       99MiB |
|    6    0    5      56687      C   ...86_64/linux/release/nbody      141MiB |
|    6    0    6      56665      C   ...86_64/linux/release/nbody      141MiB |
|    7    0    0      56701      C   ...86_64/linux/release/nbody      141MiB |
|    7    0    1      56679      C   ...86_64/linux/release/nbody      141MiB |
|    7    0    2      56704      C   ...86_64/linux/release/nbody      141MiB |
|    7    0    3      56693      C   ...86_64/linux/release/nbody      141MiB |
|    7    0    4      56715      C   ...86_64/linux/release/nbody       99MiB |
|    7    0    5      56722      C   ...86_64/linux/release/nbody       99MiB |
|    7    0    6      56712      C   ...86_64/linux/release/nbody      141MiB |
+-----------------------------------------------------------------------------+

きちんと、56 個に分離された MIG instance で nbody が実行されていることがわかります。ジョブの状態は qstat で見ると

ばらばらに 56 ジョブ流れていることがわかります。

clouduser@dgxa100-01:~$ qstat
job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
       139 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       140 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       141 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       142 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       143 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       144 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       145 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       146 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       147 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       148 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       149 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       150 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       151 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       152 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       153 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       154 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       155 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       156 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       157 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       158 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       159 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       160 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       161 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       162 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       163 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       164 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       165 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       166 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       167 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       168 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       169 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       170 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       171 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       172 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       173 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       174 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       175 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       176 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       177 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       178 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       179 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       180 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       181 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       182 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       183 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       184 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       185 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       186 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       187 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       188 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       189 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       190 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1
       191 0.55500 test-bench clouduser    r     09/04/2020 02:11:07 all.q@dgxa100-01                                                  1

 

例えば、job_id 190 の結果

 

cat test-bench.sh.o190
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.0

> Compute 8.0 CUDA device: [A100-SXM4-40GB MIG 1c.7g.40gb]
number of bodies = 409600
409600 bodies, total time for 10 iterations: 18523.258 ms
= 90.574 billion interactions per second
= 1811.476 single-precision GFLOP/s at 20 flops per interaction

実行終了した結果を見ると、1GPU に対して 1/7 – 1/8 程度の値になっており、nbodyぐらいの処理であれば、分割しても問題なく同時実行できることがわかりました。

今回の検証は、UGE 8.6.14 で行っていますが、このバージョンはまだ docker –gpu という指定に対応していないので、–runtime=nvidia を指定し、nvidia-docker v2 基準で実装しています。ただ、nvidia-docker v2 つまり –runtime=nvidia という指定は、今後なくなるといわれていましたが、結局、ほかの OCI に対応する関係上生き残っていくようです。

いくつか問題点も確認できていますが、すでに、Univa 社に問題点は報告済みとなり、修正版が出てくればそれらの問題は解決されます。現状でも、MIG Instance においても、Docker container を使用することで、互いに重複することなく利用できることは示すことができました。今回のテスト例では、簡単のために、一様な Compute Instance を使っていますが、GPU Instance また Compute Instance に異なる Instance Profile を使用しても問題なく動作することも確認できています。

今回の評価期間が短かったため、2つ(proc と device)の nvidia-capability でMIG instance がどのように働くか迄は確かめていませんので、また機会があった際には、 それらがどう働くかを見据えて、UGE による、MIG Instance の制御を追加確認したいと思います。 

 


今回の検証では国内初のDGX A100テストドライブ機を利用しています。是非ご自身でDGX A100の性能を確かめたい。実データを使った検証を行ってみたいというお客様はお気軽にジーデップ・アドバンスへご相談ください。

 

TOPへ