Re: [PATCH] dm-crypt: add the "high_priority" flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2023/3/7 9:30, yangerkun 写道:


在 2023/3/7 3:01, Mikulas Patocka 写道:


On Mon, 6 Mar 2023, Mikulas Patocka wrote:

Hi,

Thanks a lot for your review!

It's ok to fix the softlockup, but for async write encrypt,
kcryptd_crypt_write_io_submit will add bio to write_tree, and once we
call cond_resched before every kcryptd_io_write, the write performance
may be poor while we meet a high cpu usage scene.

Hi

To fix this problem, find the PID of the process "dmcrypt_write" and
change its priority to -20, for example "renice -n -20 -p 34748".

This is the proper way how to fix it; locking up the process for one
second is not.

We used to have high-priority workqueues by default, but it caused audio
playback skipping, so we had to revert it - see
f612b2132db529feac4f965f28a1b9258ea7c22b.

Perhaps we should add an option to have high-priority kernel threads?

Mikulas

Here I'm sending a patch that makes it optional to raise the priority
of dm-crypt workqueues and the write thread. Test it, if it helps for you.

Thanks, will test it!

Sorry for the delay, the scene that report the softlockup cannot been using(arm64 server), instead, we use a x86_64 server do the test.

Test step:
1. make every cpu busy, stress-ng -c 72 --timeout=360000
2. choose pmem as the disk, cryptsetup -c sm4-xts-plain64 --key-size=256 --hash sm3 luksFormat --sector-size 4096 --pbkdf=pbkdf2 /dev/pmem0 3. taskset -c 11 fio -filename=/dev/mapper/test -direct=1 -iodepth 64 -rw=randwrite -ioengine=libaio -bs=4k -runtime=60 -group_reporting -name=sqe_100write_4k -thread -numjobs=1

Results:

origin(without any patch):
[root@fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test -direct=1 -iodepth 64 -rw=randwrite -ioengine=libaio -bs=4k -runtime=60 -group_reporting -name=sqe_100write_4k -thread -numjobs=1 sqe_100write_4k: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=189MiB/s][w=48.3k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40205: Sat Mar 11 17:51:26 2023 write: IOPS=60.3k, BW=236MiB/s (247MB/s)(13.8GiB/60001msec); 0 zone resets
    slat (usec): min=2, max=14393, avg= 4.78, stdev= 9.83
    clat (usec): min=26, max=44200, avg=1054.97, stdev=705.63
     lat (usec): min=31, max=44204, avg=1059.84, stdev=706.58
    clat percentiles (usec):
     |  1.00th=[  188],  5.00th=[  359], 10.00th=[  404], 20.00th=[  469],
     | 30.00th=[  611], 40.00th=[  717], 50.00th=[  914], 60.00th=[ 1172],
     | 70.00th=[ 1303], 80.00th=[ 1467], 90.00th=[ 1827], 95.00th=[ 2180],
     | 99.00th=[ 3294], 99.50th=[ 4146], 99.90th=[ 5800], 99.95th=[ 8225],
     | 99.99th=[14091]
bw ( KiB/s): min=161560, max=486960, per=100.00%, avg=241961.65, stdev=65188.84, samples=119 iops : min=40390, max=121740, avg=60490.40, stdev=16297.21, samples=119
  lat (usec)   : 50=0.03%, 100=0.33%, 250=1.22%, 500=21.77%, 750=19.12%
  lat (usec)   : 1000=10.40%
  lat (msec)   : 2=40.23%, 4=6.34%, 10=0.53%, 20=0.03%, 50=0.01%
  cpu          : usr=16.16%, sys=29.07%, ctx=1459171, majf=0, minf=32412
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,3620430,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=13.8GiB (14.8GB), run=60001-60001msec

origin + schedule everytime:

[root@fedora hulk-5.10]# git diff
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3d975db86434..17ddca293965 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data)
                        io = crypt_io_from_node(rb_first(&write_tree));
                        rb_erase(&io->rb_node, &write_tree);
                        kcryptd_io_write(io);
+                       cond_resched();
                } while (!RB_EMPTY_ROOT(&write_tree));
                blk_finish_plug(&plug);
        }


[root@fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test -direct=1 -iodepth 64 -rw=randwrite -ioengine=libaio -bs=4k -runtime=60 -group_reporting -name=sqe_100write_4k -thread -numjobs=1 sqe_100write_4k: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=197MiB/s][w=50.4k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40287: Sat Mar 11 17:56:25 2023 write: IOPS=59.9k, BW=234MiB/s (245MB/s)(13.7GiB/60001msec); 0 zone resets
    slat (usec): min=2, max=13251, avg= 5.19, stdev= 9.59
    clat (usec): min=14, max=41490, avg=1062.91, stdev=655.31
     lat (usec): min=26, max=41495, avg=1068.20, stdev=656.78
    clat percentiles (usec):
     |  1.00th=[  269],  5.00th=[  363], 10.00th=[  412], 20.00th=[  494],
     | 30.00th=[  644], 40.00th=[  766], 50.00th=[  979], 60.00th=[ 1172],
     | 70.00th=[ 1303], 80.00th=[ 1450], 90.00th=[ 1827], 95.00th=[ 2147],
     | 99.00th=[ 3097], 99.50th=[ 3785], 99.90th=[ 5145], 99.95th=[ 5735],
     | 99.99th=[13698]
bw ( KiB/s): min=171976, max=562936, per=100.00%, avg=239748.22, stdev=61262.50, samples=119 iops : min=42994, max=140734, avg=59937.04, stdev=15315.64, samples=119
  lat (usec)   : 20=0.01%, 50=0.01%, 100=0.10%, 250=0.67%, 500=19.60%
  lat (usec)   : 750=18.69%, 1000=11.81%
  lat (msec)   : 2=42.65%, 4=6.05%, 10=0.40%, 20=0.02%, 50=0.01%
  cpu          : usr=15.94%, sys=28.99%, ctx=1564116, majf=0, minf=32437
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,3592061,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=234MiB/s (245MB/s), 234MiB/s-234MiB/s (245MB/s-245MB/s), io=13.7GiB (14.7GB), run=60001-60001msec

origin + schedule everytime + set nice:

[root@fedora hulk-5.10]# git diff
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3d975db86434..f2135c223944 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data)
                        io = crypt_io_from_node(rb_first(&write_tree));
                        rb_erase(&io->rb_node, &write_tree);
                        kcryptd_io_write(io);
+                       cond_resched();
                } while (!RB_EMPTY_ROOT(&write_tree));
                blk_finish_plug(&plug);
        }
@@ -3298,18 +3299,18 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        }

        ret = -ENOMEM;
- cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 1, devname); + cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM | WQ_HIGHPRI, 1, devname);
        if (!cc->io_queue) {
                ti->error = "Couldn't create kcryptd io queue";
                goto bad;
        }

        if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
- cc->crypt_queue = alloc_workqueue("kcryptd/%s", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, + cc->crypt_queue = alloc_workqueue("kcryptd/%s", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_HIGHPRI,
                                                  1, devname);
        else
                cc->crypt_queue = alloc_workqueue("kcryptd/%s",
- WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND, + WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND | WQ_HIGHPRI, num_online_cpus(), devname);
        if (!cc->crypt_queue) {
                ti->error = "Couldn't create kcryptd queue";
@@ -3326,6 +3327,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
                ti->error = "Couldn't spawn write thread";
                goto bad;
        }
+       set_user_nice(cc->write_thread, MIN_NICE);
        wake_up_process(cc->write_thread);

        ti->num_flush_bios = 1;


[root@fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test -direct=1 -iodepth 64 -rw=write -ioengine=libaio -bs=4k -runtime=60 -group_reporting -name=sqe_100write_4k -thread -numjobs=1 sqe_100write_4k: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [W(1)][100.0%][w=132MiB/s][w=33.7k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40483: Sat Mar 11 18:02:21 2023 write: IOPS=37.9k, BW=148MiB/s (155MB/s)(8884MiB/60001msec); 0 zone resets
    slat (usec): min=2, max=3105, avg=22.68, stdev=11.80
    clat (usec): min=18, max=46657, avg=1664.96, stdev=604.02
     lat (usec): min=39, max=46660, avg=1687.78, stdev=613.12
    clat percentiles (usec):
     |  1.00th=[  383],  5.00th=[  482], 10.00th=[  594], 20.00th=[  914],
     | 30.00th=[ 1696], 40.00th=[ 1860], 50.00th=[ 1909], 60.00th=[ 1958],
     | 70.00th=[ 1991], 80.00th=[ 2073], 90.00th=[ 2180], 95.00th=[ 2212],
     | 99.00th=[ 2376], 99.50th=[ 2442], 99.90th=[ 4359], 99.95th=[ 4883],
     | 99.99th=[12387]
bw ( KiB/s): min=115368, max=363976, per=100.00%, avg=151823.13, stdev=44148.16, samples=119 iops : min=28842, max=90994, avg=37955.78, stdev=11037.04, samples=119
  lat (usec)   : 20=0.01%, 50=0.01%, 100=0.06%, 250=0.23%, 500=7.38%
  lat (usec)   : 750=4.65%, 1000=9.11%
  lat (msec)   : 2=49.16%, 4=29.29%, 10=0.10%, 20=0.01%, 50=0.01%
  cpu          : usr=8.86%, sys=18.57%, ctx=1878960, majf=0, minf=401
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,2274350,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=148MiB/s (155MB/s), 148MiB/s-148MiB/s (155MB/s-155MB/s), io=8884MiB (9316MB), run=60001-60001msec


I have no idea why set nice will lead to a poor performance for fio... But it seems OK that cond_resched every time since it won't result performance descent. So, drop this patch and just take '[PATCH] dm-crypt: add cond_resched() to dmcrypt_write()' you send?



Mikulas



From: Mikulas Patocka <mpatocka@xxxxxxxxxx>

It was reported that dm-crypt performs badly when the system is loaded[1].
So I'm adding an option "high_priority" that sets the workqueues and the
write_thread to nice level -20. This used to be the default in the past,
but it caused audio skipping, so it had to be reverted - see the commit
f612b2132db529feac4f965f28a1b9258ea7c22b. This commit makes it optional,
so that normal users won't be harmed by it.

[1] https://listman.redhat.com/archives/dm-devel/2023-February/053410.html

Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>

---
  Documentation/admin-guide/device-mapper/dm-crypt.rst |    5 +++
  drivers/md/dm-crypt.c                                |   28 +++++++++++++------
  2 files changed, 25 insertions(+), 8 deletions(-)

Index: linux-2.6/drivers/md/dm-crypt.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-crypt.c
+++ linux-2.6/drivers/md/dm-crypt.c
@@ -133,9 +133,9 @@ struct iv_elephant_private {
   * and encrypts / decrypts at the same time.
   */
  enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
-         DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD,
-         DM_CRYPT_NO_READ_WORKQUEUE, DM_CRYPT_NO_WRITE_WORKQUEUE,
-         DM_CRYPT_WRITE_INLINE };
+         DM_CRYPT_SAME_CPU, DM_CRYPT_HIGH_PRIORITY,
+         DM_CRYPT_NO_OFFLOAD, DM_CRYPT_NO_READ_WORKQUEUE,
+         DM_CRYPT_NO_WRITE_WORKQUEUE, DM_CRYPT_WRITE_INLINE };
  enum cipher_flags {
      CRYPT_MODE_INTEGRITY_AEAD,    /* Use authenticated mode for cipher */
@@ -3087,7 +3087,7 @@ static int crypt_ctr_optional(struct dm_
      struct crypt_config *cc = ti->private;
      struct dm_arg_set as;
      static const struct dm_arg _args[] = {
-        {0, 8, "Invalid number of feature args"},
+        {0, 9, "Invalid number of feature args"},
      };
      unsigned int opt_params, val;
      const char *opt_string, *sval;
@@ -3114,6 +3114,8 @@ static int crypt_ctr_optional(struct dm_
          else if (!strcasecmp(opt_string, "same_cpu_crypt"))
              set_bit(DM_CRYPT_SAME_CPU, &cc->flags);
+        else if (!strcasecmp(opt_string, "high_priority"))
+            set_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags);
          else if (!strcasecmp(opt_string, "submit_from_crypt_cpus"))
              set_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
@@ -3352,18 +3354,22 @@ static int crypt_ctr(struct dm_target *t
      }
      ret = -ENOMEM;
-    cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 1, devname);
+    cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM |
+                       test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags) * WQ_HIGHPRI,
+                       1, devname);
      if (!cc->io_queue) {
          ti->error = "Couldn't create kcryptd io queue";
          goto bad;
      }
      if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
-        cc->crypt_queue = alloc_workqueue("kcryptd/%s", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, +        cc->crypt_queue = alloc_workqueue("kcryptd/%s", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | +                          test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags) * WQ_HIGHPRI,
                            1, devname);
      else
          cc->crypt_queue = alloc_workqueue("kcryptd/%s",
-                          WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND, +                          WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND | +                          test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags) * WQ_HIGHPRI,
                            num_online_cpus(), devname);
      if (!cc->crypt_queue) {
          ti->error = "Couldn't create kcryptd queue";
@@ -3380,6 +3386,8 @@ static int crypt_ctr(struct dm_target *t
          ti->error = "Couldn't spawn write thread";
          goto bad;
      }
+    if (test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags))
+        set_user_nice(cc->write_thread, MIN_NICE);
      ti->num_flush_bios = 1;
      ti->limit_swap_bios = true;
@@ -3500,6 +3508,7 @@ static void crypt_status(struct dm_targe
          num_feature_args += !!ti->num_discard_bios;
          num_feature_args += test_bit(DM_CRYPT_SAME_CPU, &cc->flags);
+        num_feature_args += test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags);
          num_feature_args += test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
          num_feature_args += test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags);           num_feature_args += test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags);
@@ -3513,6 +3522,8 @@ static void crypt_status(struct dm_targe
                  DMEMIT(" allow_discards");
              if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
                  DMEMIT(" same_cpu_crypt");
+            if (test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags))
+                DMEMIT(" high_priority");
              if (test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags))
                  DMEMIT(" submit_from_crypt_cpus");
              if (test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags))
@@ -3532,6 +3543,7 @@ static void crypt_status(struct dm_targe
          DMEMIT_TARGET_NAME_VERSION(ti->type);
          DMEMIT(",allow_discards=%c", ti->num_discard_bios ? 'y' : 'n');
          DMEMIT(",same_cpu_crypt=%c", test_bit(DM_CRYPT_SAME_CPU, &cc->flags) ? 'y' : 'n'); +        DMEMIT(",high_priority=%c", test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags) ? 'y' : 'n');           DMEMIT(",submit_from_crypt_cpus=%c", test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags) ?
                 'y' : 'n');
          DMEMIT(",no_read_workqueue=%c", test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags) ?
@@ -3659,7 +3671,7 @@ static void crypt_io_hints(struct dm_tar
  static struct target_type crypt_target = {
      .name   = "crypt",
-    .version = {1, 24, 0},
+    .version = {1, 25, 0},
      .module = THIS_MODULE,
      .ctr    = crypt_ctr,
      .dtr    = crypt_dtr,
Index: linux-2.6/Documentation/admin-guide/device-mapper/dm-crypt.rst
===================================================================
--- linux-2.6.orig/Documentation/admin-guide/device-mapper/dm-crypt.rst
+++ linux-2.6/Documentation/admin-guide/device-mapper/dm-crypt.rst
@@ -113,6 +113,11 @@ same_cpu_crypt
      The default is to use an unbound workqueue so that encryption work
      is automatically balanced between available CPUs.
+high_priority
+    Set dm-crypt workqueues and the writer thread to high priority. This
+    improves throughput and latency of dm-crypt while degrading general
+    responsiveness of the system.
+
  submit_from_crypt_cpus
      Disable offloading writes to a separate thread after encryption.
      There are some situations where offloading write bios from the


--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux