[PATCH 0/3] mdraid sb and bitmap write alignment on 512e drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

While investigating some performance issues on mdraid 10 volumes
formed with "512e" disks (4k native/physical sector size but with 512
byte sector emulation), I've found two cases where mdraid will
needlessly issue writes that start on 4k byte boundary, but are are
shorter than 4k:

1. writes of the raid superblock; and
2. writes of the last page of the write-intent bitmap.

The following is an excerpt of a blocktrace of one of the component
members of a mdraid 10 volume during a 4k write near the end of the
array:

  8,32  11        2     0.000001687   711  D  WS 2064 + 8 [kworker/11:1H]
* 8,32  11        5     0.001454119   711  D  WS 2056 + 1 [kworker/11:1H]
* 8,32  11        8     0.002847204   711  D  WS 2080 + 7 [kworker/11:1H]
  8,32  11       11     0.003700545  3094  D  WS 11721043920 + 8 [md127_raid1]
  8,32  11       14     0.308785692   711  D  WS 2064 + 8 [kworker/11:1H]
* 8,32  11       17     0.310201697   711  D  WS 2056 + 1 [kworker/11:1H]
  8,32  11       20     5.500799245   711  D  WS 2064 + 8 [kworker/11:1H]
* 8,32  11       23    15.740923558   711  D  WS 2080 + 7 [kworker/11:1H]

Note the starred transactions, which each start on a 4k boundary, but
are less than 4k in length, and so will use the 512-byte emulation.
Sector 2056 holds the superblock, and is written as a single 512-byte
write.  Sector 2086 holds the bitmap bit relevant to the written
sector.  When it is written the active bits of the last page of the
bitmap are written, starting at sector 2080, padded out to the end of
the 512-byte logical sector as required.  This results in a 3.5kb
write, again using the 512-byte emulation.

Note that in some arrays the last page of the bitmap may be
sufficiently full that they are not affected by the issue with the
bitmap write.

As there can be a substantial penalty to using the 512-byte sector
emulation (turning writes into read-modify writes if the relevant
sector is not in the drive's cache) I believe it makes sense to pad
these writes out to a 4k boundary.  The writes are already padded out
for "4k native" drives, where the short access is illegal.

The following patch set changes the superblock and bitmap writes to
respect the physical block size (e.g. 4k for today's 512e drives) when
possible.  In each case there is already logic for padding out to the
underlying logical sector size.  I reuse or repeat the logic for
padding out to the physical sector size, but treat the padding out as
optional rather than mandatory.

The corresponding block trace with these patches is:

   8,32   1        2     0.000003410   694  D  WS 2064 + 8 [kworker/1:1H]
   8,32   1        5     0.001368788   694  D  WS 2056 + 8 [kworker/1:1H]
   8,32   1        8     0.002727981   694  D  WS 2080 + 8 [kworker/1:1H]
   8,32   1       11     0.003533831  3063  D  WS 11721043920 + 8 [md127_raid1]
   8,32   1       14     0.253952321   694  D  WS 2064 + 8 [kworker/1:1H]
   8,32   1       17     0.255354215   694  D  WS 2056 + 8 [kworker/1:1H]
   8,32   1       20     5.337938486   694  D  WS 2064 + 8 [kworker/1:1H]
   8,32   1       23    15.577963062   694  D  WS 2080 + 8 [kworker/1:1H]

I do notice that the code for bitmap writes has a more sophisticated
and thorough check for overlap than the code for superblock writes.
(Compare write_sb_page in md-bitmap.c vs. super_1_load in md.c.) From
what I know since the various structures starts have always been 4k
aligned anyway, it is always safe to pad the superblock write out to
4k (as occurs on 4k native drives) but not necessarily futher.

Feedback appreciated.

  --Chris


Christopher Unkel (3):
  md: align superblock writes to physical blocks
  md: factor sb write alignment check into function
  md: pad writes to end of bitmap to physical blocks

 drivers/md/md-bitmap.c | 80 +++++++++++++++++++++++++-----------------
 drivers/md/md.c        | 15 ++++++++
 2 files changed, 63 insertions(+), 32 deletions(-)

-- 
2.17.1




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux