On 02/18/2018 06:59 PM, Shaohua Li wrote:
On Wed, Feb 14, 2018 at 02:23:29PM +0100, Mariusz Dabrowski wrote:
This patchset adds support for write hints in MD driver. This is a new
feature for NVMe drives compliant to 1.3 specification and introduced to
Linux in kernel 4.13. Write hint has to be copied from bio containing user
data to bios sent to RAID members. Additionally, write hint can be set for
internal data like parity and PPL in RAID 5.
Setting write hint for parity is done with simple classification algorithm
which works for sequential IO workload. It tries to predict which parity
request are going to be overwritten in a moment and sets write hint for
them. This algorithm uses stripe cache to count updates of each parity
chunk. Parity request will be predicted as "soon-overwritten" if nubmer of
parity updates is smaller than number of data chunks in stripe.
For PPL there is no special algorithm. It is updated very frequently so we
can set write hint for each PPL write.
We have performed our internal tests which prove that setting write hint
for parity and PPL can significantly reduce write amplification.
I can apply the first 2 patches first.
For other patches, I'm not confident. A write hint just means a write stream,
or a stream ID. Userspace doesn't need to assign shore live data to
RWH_WRITE_LIFE_SHORT. It could assign long live data to RWH_WRITE_LIFE_SHORT
but short live data to RWH_WRITE_LIFE_LONG. Nothing prevents userspace to do
this. Fixed policy like what the patches do isn't flexible and sometimes
harmful for performance depending on specific applications.
Thanks,
Shaohua
I agree that this fixed policy is not the best we can do. I can change it and
allow setting which hint will be used for parity/ppl. I think of 2 approaches:
1) setting hint ID at the same time as policy, for example:
echo parity=2 > /sys/block/md126/md/write_hint_policy
2) new sysfs attributes for setting hint ID:
echo parity > /sys/block/md126/md/write_hint_policy
echo ppl > /sys/block/md126/md/write_hint_policy
echo 1 > /sys/block/md126/md/parity_write_hint
echo 2 > /sys/block/md126/md/ppl_write_hint
What are your thoughts about this, is any of those proposals acceptable for you?
Maybe you've got better idea how to make this more flexible?
Thanks,
Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html