Re: [PATCH v3 00/11] md/raid5: set STRIPE_SIZE as a configurable value

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2020/5/29 6:07, Guoqing Jiang wrote:
On 5/27/20 3:19 PM, Yufen Yu wrote:
Hi, all

  For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5 will
  issus echo bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64. However,
  filesystem usually issue bio in the unit of 4KB. Then, RAID5 will waste resource
  of disk bandwidth.

Could you explain a little bit about "waste resource"? Does it mean the chance for
full stripe write is limited because of  the incompatible between fs (4KB bio) and
raid5 (64KB stripe unit)?

Applications may request 4KB data, but RAID5 will issue 64KB size bio to disk, which
will waste disk bandwidth and more cpu time to compute xor. Detail performance data
can see in previous email:
https://www.spinics.net/lists/raid/msg64261.html


  To solve the problem, this patchset provide a new config CONFIG_MD_RAID456_STRIPE_SHIFT
  to let user config STRIPE_SIZE. The default value is 1, means 4096(1<<9).

  Normally, using default STRIPE_SIZE can get better performance. And NeilBrown have
  suggested just to fix the STRIPE_SIZE as 4096.But, out test result show that
  big value of STRIPE_SIZE may have better performance when size of issued IOs are
  mostly bigger than 4096. Thus, in this patchset, we still want to set STRIPE_SIZE
  as a configureable value.

I think it is better to define stripe size as 4K if it fits to generally scenario, and also
aligns with fs.

  In current implementation, grow_buffers() uses alloc_page() to allocate the buffers
  for each stripe_head. With the change, it means we allocate 64K buffers but just
  use 4K of them. To save memory, we try to 'compress' multiple buffers of stripe_head
  to only one real page. Detail shows in patch #2.

  To evaluate the new feature, we create raid5 device '/dev/md5' with 4 SSD disk
  and test it on arm64 machine with 64KB PAGE_SIZE.
  1) We format /dev/md5 with mkfs.ext4 and mount ext4 with default configure on
     /mnt directory. Then, trying to test it by dbench with command:
     dbench -D /mnt -t 1000 10. Result show as:
     'STRIPE_SHIFT = 64KB'
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    9805011     0.021    64.728
       Close        7202525     0.001     0.120
       Rename        415213     0.051    44.681
       Unlink       1980066     0.079    93.147
       Deltree          240     1.793     6.516
       Mkdir            120     0.004     0.007
       Qpathinfo    8887512     0.007    37.114
       Qfileinfo    1557262     0.001     0.030
       Qfsinfo      1629582     0.012     0.152
       Sfileinfo     798756     0.040    57.641
       Find         3436004     0.019    57.782
       WriteX       4887239     0.021    57.638
       ReadX        15370483     0.005    37.818
       LockX          31934     0.003     0.022
       UnlockX        31933     0.001     0.021
       Flush         687205    13.302   530.088
      Throughput 307.799 MB/sec  10 clients  10 procs  max_latency=530.091 ms
      -------------------------------------------------------
     'STRIPE_SIZE = 4KB'
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    11999166     0.021    36.380
       Close        8814128     0.001     0.122
       Rename        508113     0.051    29.169
       Unlink       2423242     0.070    38.141
       Deltree          300     1.885     7.155
       Mkdir            150     0.004     0.006
       Qpathinfo    10875921     0.007    35.485
       Qfileinfo    1905837     0.001     0.032
       Qfsinfo      1994304     0.012     0.125
       Sfileinfo     977450     0.029    26.489
       Find         4204952     0.019     9.361
       WriteX       5981890     0.019    27.804
       ReadX        18809742     0.004    33.491
       LockX          39074     0.003     0.025
       UnlockX        39074     0.001     0.014
       Flush         841022    10.712   458.848
      Throughput 376.777 MB/sec  10 clients  10 procs  max_latency=458.852 ms
      -------------------------------------------------------

What is the default io unit size of dbench?

Since dbench runs on ext4 filesystem, so I think most io size is about 4KB.


   2) We try to evaluate IO throughput for /dev/md5 by fio with config:
      [4KB randwrite]
      direct=1
      numjob=2
      iodepth=64
      ioengine=libaio
      filename=/dev/md5
      bs=4KB
      rw=randwrite
      [64KB write]
      direct=1
      numjob=2
      iodepth=64
      ioengine=libaio
      filename=/dev/md5
      bs=1MB
      rw=write
     The fio test result as follow:
                    +                   +
                    | STRIPE_SIZE(64KB) | STRIPE_SIZE(4KB)
      +----------------------------------------------------+
      4KB randwrite |     15MB/s        |      100MB/s
      +----------------------------------------------------+
      1MB write     |   1000MB/s        |      700MB/s
     The result show that when size of io is bigger than 4KB (64KB),
     64KB STRIPE_SIZE has much higher IOPS. But for 4KB randwrite, that
     means, size of io issued to device are smaller, 4KB STRIPE_SIZE
     have better performance.

The 4k rand write performance drops from 100MB/S to 15MB/S?! How about other
io sizes? Say 16k, 64K and 256K etc, it would be more convincing if 64KB stripe
has better performance than 4KB stripe overall.


Maybe I have not explain clearly. Here, the fio test result shows that 4KB
STRIPE_SIZE is not always have better performance. If applications request
IO size mostly are bigger than 4KB, likely 1MB in test, set STRIPE_SIZE with
a bigger value can get better performance.

So, we try to provide a configurable STRIPE_SIZE, rather than fix STRIPE_SIZE as 4096.

Thanks,
Yufen



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux