On 02/21/2018 06:54 PM, Shaohua Li wrote:
On Tue, Feb 20, 2018 at 02:59:55PM +0100, Mariusz Dabrowski wrote:
On 02/18/2018 06:59 PM, Shaohua Li wrote:
On Wed, Feb 14, 2018 at 02:23:29PM +0100, Mariusz Dabrowski wrote:
This patchset adds support for write hints in MD driver. This is a new
feature for NVMe drives compliant to 1.3 specification and introduced to
Linux in kernel 4.13. Write hint has to be copied from bio containing user
data to bios sent to RAID members. Additionally, write hint can be set for
internal data like parity and PPL in RAID 5.
Setting write hint for parity is done with simple classification algorithm
which works for sequential IO workload. It tries to predict which parity
request are going to be overwritten in a moment and sets write hint for
them. This algorithm uses stripe cache to count updates of each parity
chunk. Parity request will be predicted as "soon-overwritten" if nubmer of
parity updates is smaller than number of data chunks in stripe.
For PPL there is no special algorithm. It is updated very frequently so we
can set write hint for each PPL write.
We have performed our internal tests which prove that setting write hint
for parity and PPL can significantly reduce write amplification.
I can apply the first 2 patches first.
For other patches, I'm not confident. A write hint just means a write stream,
or a stream ID. Userspace doesn't need to assign shore live data to
RWH_WRITE_LIFE_SHORT. It could assign long live data to RWH_WRITE_LIFE_SHORT
but short live data to RWH_WRITE_LIFE_LONG. Nothing prevents userspace to do
this. Fixed policy like what the patches do isn't flexible and sometimes
harmful for performance depending on specific applications.
Thanks,
Shaohua
I agree that this fixed policy is not the best we can do. I can change it
and allow setting which hint will be used for parity/ppl. I think of 2
approaches:
1) setting hint ID at the same time as policy, for example:
echo parity=2 > /sys/block/md126/md/write_hint_policy
2) new sysfs attributes for setting hint ID:
echo parity > /sys/block/md126/md/write_hint_policy
echo ppl > /sys/block/md126/md/write_hint_policy
echo 1 > /sys/block/md126/md/parity_write_hint
echo 2 > /sys/block/md126/md/ppl_write_hint
What are your thoughts about this, is any of those proposals acceptable for
you? Maybe you've got better idea how to make this more flexible?
Frankly I have no idea what this should be done. Adding an interface before we
think through the problem is blind too, we don't have confidence the new
interface (part of ABI) will not be changed in the future. And the interface is
system wide, how does it work if we have two workloads running with different
write hint policy?
Could you explain in more details what are your concerns about interface? And by
"two workloads running with different write hint policy" you mean two separated
workloads with different data write hints? As long as we don't touch data write
hint from those workloads and pass it down to the drives, we can group PPL and
parity writes with short life time.
Thanks,
Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html