Re: Improve processing efficiency for addition and deletion of multipath devices

Zdenek Kabelac <zkabelac@xxxxxxxxxx> · Mon, 28 Nov 2016 12:51:49 +0100

Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
Dne 28.11.2016 v 03:19 tang.junhui@xxxxxxxxxx napsal(a):
Hello Christophe, Ben, Hannes, Martin, Bart,
I am a member of host-side software development team of ZXUSP storage
project
in ZTE Corporation. Facing the market demand, our team decides to
write code to
promote multipath efficiency next month. The whole idea is in the mail
below.We
hope to participate in and make progress with the open source
community, so any
suggestion and comment would be welcome.

Hi

First - we are aware of these issue.

The solution proposed in this mail would surely help - but there is
likely a bigger issue to be solved first.

The core trouble is to avoid  'blkid' disk identification to be executed.
Recent version of multipath is already marking plain 'RELOAD' operation
of table (which should not be changing disk content) with extra DM bit,
so udev rules ATM skips 'pvscan' - we also would like to extend the
functionality to skip rules more and reimport existing 'symlinks' from
udev database (so they would not get deleted).

I believe the processing of udev rules is 'relatively' quick as long
as it does not need to read/write ANYTHING from real disks.

Hmm. You sure this is an issue?
We definitely need to skip uevent handling when a path goes down (but I
think we do that already), but for 'add' events we absolutely need to
call blkid to figure out if the device has changed.
There are storage arrays out there who use a 'path down/path up' cycle
to inform initiators about any device layout change.
So we wouldn't be able to handle those properly if we don't call blkid here.

The core trouble is -

With multipath device - you ONLY want to 'scan' device (with blkid)  when
only the initial first member device of multipath gets in.

So you start multipath (resume -> CHANGE) - it should be the ONLY place
to run 'blkid' test (which really goes though over 3/4MB of disk read,
to check if there is not ZFS somewhere)

Then any next disk being a member of multipath (recognized by 'multipath -c',
should NOT scan)  - as far  as  I can tell current order is opposite,
fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.

Thus every add disk fires very lengthy blkid scan.

Of course I'm not here an expert on dm multipath rules so passing this on to 
prajnoha@ -  but I'd guess this is primary source of slowdowns.

There should be exactly ONE blkid for a single multipath device - as
long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
component devices)

So while aggregation of 'uevents' in multipath would 'shorten' queue
processing of events - it would still not speedup scan alone.

We need to drastically shorten unnecessary disk re-scanning.

Also note - if you have a lot of disks -  it might be worth to checkout
whether udev picks  'right amount of udev workers'.
There is heuristic logic to avoid system overload - but might be worth
to check if in you system with your amount of CPU/RAM/DISKS  the
computed number is the best for scaling - i.e. if you double the amount
of workers - do you
get any better performance ?

That doesn't help, as we only have one queue (within multipath) to
handle all uevents.

This was meant for systems with many different multipath devices.
Obviously would not help with a single multipath device.

Regards

Zdenek

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel