On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: > Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): >> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: >>> Dne 28.11.2016 v 03:19 tang.junhui@xxxxxxxxxx napsal(a): >>>> Hello Christophe, Ben, Hannes, Martin, Bart, >>>> I am a member of host-side software development team of ZXUSP storage >>>> project >>>> in ZTE Corporation. Facing the market demand, our team decides to >>>> write code to >>>> promote multipath efficiency next month. The whole idea is in the mail >>>> below.We >>>> hope to participate in and make progress with the open source >>>> community, so any >>>> suggestion and comment would be welcome. >>>> >>> >>> >>> Hi >>> >>> First - we are aware of these issue. >>> >>> The solution proposed in this mail would surely help - but there is >>> likely a bigger issue to be solved first. >>> >>> The core trouble is to avoid 'blkid' disk identification to be >>> executed. >>> Recent version of multipath is already marking plain 'RELOAD' operation >>> of table (which should not be changing disk content) with extra DM bit, >>> so udev rules ATM skips 'pvscan' - we also would like to extend the >>> functionality to skip rules more and reimport existing 'symlinks' from >>> udev database (so they would not get deleted). >>> >>> I believe the processing of udev rules is 'relatively' quick as long >>> as it does not need to read/write ANYTHING from real disks. >>> >> Hmm. You sure this is an issue? >> We definitely need to skip uevent handling when a path goes down (but I >> think we do that already), but for 'add' events we absolutely need to >> call blkid to figure out if the device has changed. >> There are storage arrays out there who use a 'path down/path up' cycle >> to inform initiators about any device layout change. >> So we wouldn't be able to handle those properly if we don't call blkid >> here. > > The core trouble is - > > > With multipath device - you ONLY want to 'scan' device (with blkid) when > only the initial first member device of multipath gets in. > > So you start multipath (resume -> CHANGE) - it should be the ONLY place > to run 'blkid' test (which really goes though over 3/4MB of disk read, > to check if there is not ZFS somewhere) > > Then any next disk being a member of multipath (recognized by 'multipath > -c', > should NOT scan) - as far as I can tell current order is opposite, > fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. > > Thus every add disk fires very lengthy blkid scan. > > Of course I'm not here an expert on dm multipath rules so passing this > on to prajnoha@ - but I'd guess this is primary source of slowdowns. > > There should be exactly ONE blkid for a single multipath device - as > long as 'RELOAD' only add/remove paths (there is no reason to scan > component devices) > ATM 'multipath -c' is just a simple test if the device is supposed to be handled by multipath. And the number of bytes read by blkid should be _that_ large; a simple 'blkid' on my device caused it to read 35k ... Also udev will become very unhappy if we're not calling blkid for every device; you'd be having a hard time reconstructing the event for those devices. While it's trivial to import variables from parent devices, it's impossible to do that from unrelated devices; you'd need a dedicated daemon for that. So we cannot skip blkid without additional tooling. >> >>> So while aggregation of 'uevents' in multipath would 'shorten' queue >>> processing of events - it would still not speedup scan alone. >>> >>> We need to drastically shorten unnecessary disk re-scanning. >>> >>> Also note - if you have a lot of disks - it might be worth to checkout >>> whether udev picks 'right amount of udev workers'. >>> There is heuristic logic to avoid system overload - but might be worth >>> to check if in you system with your amount of CPU/RAM/DISKS the >>> computed number is the best for scaling - i.e. if you double the amount >>> of workers - do you >>> get any better performance ? >>> >> That doesn't help, as we only have one queue (within multipath) to >> handle all uevents. > > This was meant for systems with many different multipath devices. > Obviously would not help with a single multipath device. > I'm talking about the multipath daemon. There will be exactly _one_ instance of the multipath daemon running for the entire system, which will be handling _all_ udev events with a single queue. Independent on the number of attached devices. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel