On Tue, Feb 27, 2018 at 08:32:34AM +0100, Mariusz Dabrowski wrote: > On 02/21/2018 06:54 PM, Shaohua Li wrote: > > On Tue, Feb 20, 2018 at 02:59:55PM +0100, Mariusz Dabrowski wrote: > > > On 02/18/2018 06:59 PM, Shaohua Li wrote: > > > > On Wed, Feb 14, 2018 at 02:23:29PM +0100, Mariusz Dabrowski wrote: > > > > > This patchset adds support for write hints in MD driver. This is a new > > > > > feature for NVMe drives compliant to 1.3 specification and introduced to > > > > > Linux in kernel 4.13. Write hint has to be copied from bio containing user > > > > > data to bios sent to RAID members. Additionally, write hint can be set for > > > > > internal data like parity and PPL in RAID 5. > > > > > > > > > > Setting write hint for parity is done with simple classification algorithm > > > > > which works for sequential IO workload. It tries to predict which parity > > > > > request are going to be overwritten in a moment and sets write hint for > > > > > them. This algorithm uses stripe cache to count updates of each parity > > > > > chunk. Parity request will be predicted as "soon-overwritten" if nubmer of > > > > > parity updates is smaller than number of data chunks in stripe. > > > > > > > > > > For PPL there is no special algorithm. It is updated very frequently so we > > > > > can set write hint for each PPL write. > > > > > > > > > > We have performed our internal tests which prove that setting write hint > > > > > for parity and PPL can significantly reduce write amplification. > > > > > > > > I can apply the first 2 patches first. > > > > > > > > For other patches, I'm not confident. A write hint just means a write stream, > > > > or a stream ID. Userspace doesn't need to assign shore live data to > > > > RWH_WRITE_LIFE_SHORT. It could assign long live data to RWH_WRITE_LIFE_SHORT > > > > but short live data to RWH_WRITE_LIFE_LONG. Nothing prevents userspace to do > > > > this. Fixed policy like what the patches do isn't flexible and sometimes > > > > harmful for performance depending on specific applications. > > > > > > > > Thanks, > > > > Shaohua > > > > > > > > > > I agree that this fixed policy is not the best we can do. I can change it > > > and allow setting which hint will be used for parity/ppl. I think of 2 > > > approaches: > > > 1) setting hint ID at the same time as policy, for example: > > > echo parity=2 > /sys/block/md126/md/write_hint_policy > > > 2) new sysfs attributes for setting hint ID: > > > echo parity > /sys/block/md126/md/write_hint_policy > > > echo ppl > /sys/block/md126/md/write_hint_policy > > > echo 1 > /sys/block/md126/md/parity_write_hint > > > echo 2 > /sys/block/md126/md/ppl_write_hint > > > > > > What are your thoughts about this, is any of those proposals acceptable for > > > you? Maybe you've got better idea how to make this more flexible? > > > > Frankly I have no idea what this should be done. Adding an interface before we > > think through the problem is blind too, we don't have confidence the new > > interface (part of ABI) will not be changed in the future. And the interface is > > system wide, how does it work if we have two workloads running with different > > write hint policy? > > Could you explain in more details what are your concerns about interface? That said I don't know what the interface should look like, so I don't like to add experimental interface. > And by "two workloads running with different write hint policy" you mean two > separated workloads with different data write hints? As long as we don't > touch data write hint from those workloads and pass it down to the drives, > we can group PPL and parity writes with short life time. How can you prevent this? it's legal two applications running in the same system use and treat the hint in different way. BTW, can you send me the first two patches with fix? Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html