[PATCH v5 0/7] Partial Parity Log for MD RAID 5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This series of patches implements the Partial Parity Log for RAID5 arrays. The
purpose of this feature is closing the RAID 5 Write Hole. It is a solution
alternative to the existing raid5-cache, but the logging workflow and much of
the implementation is based on it.

The main differences compared to raid5-cache is that PPL is a distributed log -
it is stored on array member drives in the metadata area and does not require a
dedicated journaling drive. Write performance is reduced by up to 30%-40% but
it scales with the number of drives in the array and the journaling drive does
not become a bottleneck or a single point of failure. PPL does not protect from
losing in-flight data, only from silent data corruption. More details about how
the log works can be found in patches 3 and 5.

This feature originated from Intel RSTe, which uses IMSM metadata. PPL for IMSM
is going to be included in RSTe implementations starting with upcoming Xeon
platforms and Intel will continue supporting and maintaining it. This patchset
implements PPL for external metadata (specifically IMSM) as well as native MD
v1.x metadata.

Changes in mdadm are also required to make this fully usable. Patches for mdadm
will be sent later.

v5:
- Added a common raid5-cache and ppl interface in raid5-log.h.
- Moved ops_run_partial_parity() to raid5-ppl.c.
- Use an inline bio in struct ppl_io_unit, simplify ppl_submit_iounit() and fix
  a potential bio allocation issue.
- Simplified condition for appending a stripe_head to ppl entry in
  ppl_log_stripe().
- Flush disk cache after ppl recovery, write with FUA in
  ppl_write_empty_header().
- Removed order > 0 page allocation in ppl_recover_entry().
- Put r5l_io_unit and ppl_io_unit in a union in struct stripe_head.
- struct ppl_conf *ppl in struct r5conf replaced with void *log_private.
- Improved comments and descriptions.

v4:
- Separated raid5-cache and ppl structures.
- Removed the policy logic from raid5-cache, ppl calls moved to raid5 core.
- Checking wrong configuration when validating superblock.
- Moved documentation to separate file.
- More checks for ppl sector/size.
- Some small fixes and improvements.

v3:
- Fixed alignment issues in the metadata structures.
- Removed reading IMSM signature from superblock.
- Removed 'rwh_policy' and per-device JournalPpl flags, added
  'consistency_policy', 'ppl_sector' and 'ppl_size' sysfs attributes.
- Reworked and simplified disk removal logic.
- Debug messages in raid5-ppl.c converted to pr_debug().
- Fixed some bugs in logging and recovery code.
- Improved descriptions and documentation.

v2:
- Fixed wrong PPL size calculation for IMSM.
- Simplified full stripe write case.
- Removed direct access to bi_io_vec.
- Handle failed bio_add_page().

Artur Paszkiewicz (7):
  md: superblock changes for PPL
  raid5: separate header for log functions
  raid5-ppl: Partial Parity Log write logging implementation
  md: add sysfs entries for PPL
  raid5-ppl: load and recover the log
  raid5-ppl: support disk hot add/remove with PPL
  raid5-ppl: runtime PPL enabling or disabling

 Documentation/admin-guide/md.rst |   32 +-
 Documentation/md/raid5-ppl.txt   |   44 ++
 drivers/md/Makefile              |    2 +-
 drivers/md/md.c                  |  140 +++++
 drivers/md/md.h                  |   10 +
 drivers/md/raid0.c               |    3 +-
 drivers/md/raid1.c               |    3 +-
 drivers/md/raid5-cache.c         |   22 +-
 drivers/md/raid5-log.h           |  114 ++++
 drivers/md/raid5-ppl.c           | 1247 ++++++++++++++++++++++++++++++++++++++
 drivers/md/raid5.c               |  182 ++++--
 drivers/md/raid5.h               |   40 +-
 include/uapi/linux/raid/md_p.h   |   45 +-
 13 files changed, 1799 insertions(+), 85 deletions(-)
 create mode 100644 Documentation/md/raid5-ppl.txt
 create mode 100644 drivers/md/raid5-log.h
 create mode 100644 drivers/md/raid5-ppl.c

-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux