On 11/28/2016 01:08 PM, Hannes Reinecke wrote: > On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: >> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): >>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: >>>> Dne 28.11.2016 v 03:19 tang.junhui@xxxxxxxxxx napsal(a): >>>>> Hello Christophe, Ben, Hannes, Martin, Bart, >>>>> I am a member of host-side software development team of ZXUSP storage >>>>> project >>>>> in ZTE Corporation. Facing the market demand, our team decides to >>>>> write code to >>>>> promote multipath efficiency next month. The whole idea is in the mail >>>>> below.We >>>>> hope to participate in and make progress with the open source >>>>> community, so any >>>>> suggestion and comment would be welcome. >>>>> >>>> >>>> >>>> Hi >>>> >>>> First - we are aware of these issue. >>>> >>>> The solution proposed in this mail would surely help - but there is >>>> likely a bigger issue to be solved first. >>>> >>>> The core trouble is to avoid 'blkid' disk identification to be >>>> executed. >>>> Recent version of multipath is already marking plain 'RELOAD' operation >>>> of table (which should not be changing disk content) with extra DM bit, >>>> so udev rules ATM skips 'pvscan' - we also would like to extend the >>>> functionality to skip rules more and reimport existing 'symlinks' from >>>> udev database (so they would not get deleted). >>>> >>>> I believe the processing of udev rules is 'relatively' quick as long >>>> as it does not need to read/write ANYTHING from real disks. >>>> >>> Hmm. You sure this is an issue? >>> We definitely need to skip uevent handling when a path goes down (but I >>> think we do that already), but for 'add' events we absolutely need to >>> call blkid to figure out if the device has changed. >>> There are storage arrays out there who use a 'path down/path up' cycle >>> to inform initiators about any device layout change. >>> So we wouldn't be able to handle those properly if we don't call blkid >>> here. >> >> The core trouble is - >> >> >> With multipath device - you ONLY want to 'scan' device (with blkid) when >> only the initial first member device of multipath gets in. >> >> So you start multipath (resume -> CHANGE) - it should be the ONLY place >> to run 'blkid' test (which really goes though over 3/4MB of disk read, >> to check if there is not ZFS somewhere) >> >> Then any next disk being a member of multipath (recognized by 'multipath >> -c', >> should NOT scan) - as far as I can tell current order is opposite, >> fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. >> >> Thus every add disk fires very lengthy blkid scan. >> >> Of course I'm not here an expert on dm multipath rules so passing this >> on to prajnoha@ - but I'd guess this is primary source of slowdowns. >> >> There should be exactly ONE blkid for a single multipath device - as >> long as 'RELOAD' only add/remove paths (there is no reason to scan >> component devices) >> > ATM 'multipath -c' is just a simple test if the device is supposed to be > handled by multipath. > > And the number of bytes read by blkid should be _that_ large; a simple > 'blkid' on my device caused it to read 35k ... > > Also udev will become very unhappy if we're not calling blkid for every > device; you'd be having a hard time reconstructing the event for those > devices. What do you mean with event reconstruction? I don't think we really need to call blkid for every device. If we have configured that certain device is surely an mpath component (based on WWN in mpath config), I think we don't need to call blkid at all - it's mpath component and the top-level device should be simply used for any scanning. I mean, I still don't see why do we need to call blkid and then overwrite the ID_FS_TYPE variable right away based on the fact that it's multipath -c. If we reverse this order, we could save the extra blkid that's not actually needed. -- Peter -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel