Re: Discussion: performance issue on event activation mode

"heming.zhao@xxxxxxxx" <heming.zhao@xxxxxxxx> · Mon, 13 Sep 2021 00:51:46 +0800

On 9/11/21 1:38 AM, Martin Wilck wrote:
On Thu, 2021-09-09 at 14:44 -0500, David Teigland wrote:
On Tue, Jun 08, 2021 at 01:23:33PM +0000, Martin Wilck wrote:
On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
On Mon 07 Jun 2021 16:48, David Teigland wrote:

If there are say 1000 PVs already present on the system, there
could be
real savings in having one lvm command process all 1000, and
then
switch
over to processing uevents for any further devices afterward.
The
switch
over would be delicate because of the obvious races involved
with
new devs
appearing, but probably feasible.

Maybe to avoid the race, we could possibly write the proposed
"/run/lvm2/boot-finished" right before we initiate scanning in
"vgchange
-aay" that is a part of the lvm2-activation-net.service (the last
service to do the direct activation).

A few event-based pvscans could fire during the window between
"scan initiated phase" in lvm2-activation-net.service's
"ExecStart=vgchange -aay..."
and the originally proposed "ExecStartPost=/bin/touch
/run/lvm2/boot-
finished",
but I think still better than missing important uevents
completely in
this window.

That sounds reasonable. I was thinking along similar lines. Note
that
in the case where we had problems lately, all actual activation
(and
slowness) happened in lvm2-activation-early.service.

I've implemented a solution like this and would like any thoughts,
improvements, or testing to verify it can help:
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-activation-switch-1

I've taken some direction from the lvm activation generator, but
there are
details of that I'm not too familiar with, so I may be missing
something
(in particular it has three activation points but I'm showing two
below.)
This new method would probably let us drop the activation-generator,
since
we could easily configure an equivalent using this new method.

Here's how it works:

uevents for PVs run pvscan with the new option --eventactivation
check.
This makes pvscan check if the /run/lvm/event-activation-on file
exists.
If not, pvscan does nothing.

lvm-activate-vgs-main.service
. always runs (not generated)
. does not wait for other virtual block device systems to start
. runs vgchange -aay to activate any VGs already present

lvm-activate-vgs-last.service
. always runs (not generated)
. runs after other systems, like multipathd, have started (we want it
   to find as many VGs to activate as possible)
. runs vgchange -aay --eventactivation enable
. the --eventactivation enable creates /run/lvm/event-activation-on,
   which enables the traditional pvscan activations from uevents.
. this vgchange also creates pv online files for existing PVs.
   (Future pvscans will need the online files to know when VGs are
   completed, i.e. for VGs that are partially complete at the point
   of switching to event based actvivation.)

uevents for PVs continue to run pvscan with the new option
--eventactivation check, but the check now sees the event-activation-
on
temp file, so they will do activation as they have before.

Notes:

- To avoid missing VGs during the transition to event-based, the
vgchange
in lvm-activate-vgs-last will create event-activation-on before doing
anything else.  This means for a period of time both vgchange and
pvscan
may attempt to activate the same VG.  These commits use the existing
mechanism to resolve this (the --vgonline option and
/run/lvm/vgs_online).

- We could use the new lvm-activate-* services to replace the
activation
generator when lvm.conf event_activation=0.  This would be done by
simply
not creating the event-activation-on file when event_activation=0.

- To do the reverse, and use only event based activation without any
lvm-activate-vgs services, a new lvm.conf setting could be used, e.g.
event_activation_switch=0 and disabling lvm-activate-vgs services.

This last idea sounds awkward to me. But the rest is very nice.
Heming, do you agree we should give it a try?

the last note is do the compatible things. we can't image & can't test all
the use cases, create a switch is a good idea.
but believe me, except lvm2 developers, no one understand event/direct activation
story. the new cfg item (event_activation_switch) is related with another item
(event_activation) will make users confuse.

We should help users to do the best performance job/selection. So we could reuse
the item "event_activation", current value is 0 and 1, we can add new value '2'.
i.e.:
0 - disable event_activation (use direct activation)
1 - new behaviour
2 - old/legacy mode/behavior

default value is 1 but the lvm behavior is changed.
if anyone want to use/reset, to assign '2' to this item.

-------
I had verified this new feature in my env. this feature make a great progress.

new feature with lvm config:
obtain_device_list_from_udev = 1
event_activation = 1
udev_sync = 1

systemd-analyze blame: (top 9 items)
         20.809s lvm2-pvscan@134:544.service
         20.808s lvm2-pvscan@134:656.service
         20.808s lvm2-pvscan@134:528.service
         20.807s lvm2-pvscan@133:640.service
         20.806s lvm2-pvscan@133:672.service
         20.785s lvm2-pvscan@134:672.service
         20.784s lvm2-pvscan@134:624.service
         20.784s lvm2-pvscan@128:1008.service
         20.783s lvm2-pvscan@128:832.service

the same lvm config costed 2min 6.736s (could find this result from my previous mail).

and the shortest time in previous mail is 17.552s, under cfg:
obtain_device_list_from_udev=1, event_activation=0, udev_sync=1

the result (20.809s) is very close to direct activation, which is also reasonable.
(lvm first uses direct mode then switch to event mode)

Thanks,
Heming

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/