On Apr 8, 2009, at 3:31 AM, Neil Brown wrote:
On Monday April 6, dledford@xxxxxxxxxx wrote:I'm not attaching a patch for this because it's so simple. Long story short, watching both add and change events in udev rules is bad for mddevices. Specifically, the kernel will generate a change event on things like array stop, and on things like fdisk close. In the case of array stop, it can result in the array being assembled again immediately. In the case of fdisk close, the situation is worse.Let's say you stop all the md devices on some block device in order torepartition. You run fdisk, change the partition table, then issue a write of the table. The write of the table triggers the change event *before* the kernel updates the partition table in memory for the block device, causing udev to rerun the incremental rules on the old partition table and restart all the arrays you just stopped with the old partition table layout, at which point the kernel is unable to reread the partition table. So, once you've enable incremental assembly, it becomes apparent that what we really want is to only start devices on add, not on add|change.So you aren't really talking about events on md devices, but rather on events that might become components of md devices - correct? So in udev-md-raid.rules we currently have .... ACTION!="add|change", GOTO="md_end" # import data from a raid member and activate it#ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm --examine --export $tempnode", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"# import data from a raid set KERNEL!="md*", GOTO="md_end" ...... And you are saying that if we uncomment the "#ENV{...." line, then we only want it to fire on add devices. So something like: .... ACTION!="add|change", GOTO="md_end" +ACTION=="change", GOTO="no_incr" # import data from a raid member and activate it#ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm --examine --export $tempnode", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"# import data from a raid set +LABEL="no_incr" KERNEL!="md*", GOTO="md_end" ...... Is that correct? Your explanation certainly seems reasonable.
As I was reading through this, I certainly thought the above implementation was reasonable, and then I got this new bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=495034So, mdadm-3.0-0.devel3.5 leaves the default udev rules file in place and only defines a little 70-mdadm-incremental.rules that only does the incremental assembly. That file only watches for add events. The earlier mdadm that the user says doesn't have the bug replaces the existing rules file entirely and uncomments the incremental part. Long story short, it would appear the add|change in the normal rules file is possible responsible for a long string of these:
# udevadm monitor .... UDEV [1239266575.473953] change /devices/virtual/block/md5 (block) UEVENT[1239266575.510739] change /devices/virtual/block/md5 (block) UDEV [1239266575.516114] change /devices/virtual/block/md5 (block) UEVENT[1239266575.552984] change /devices/virtual/block/md5 (block) ....I would do some investigation on your own machines, but given udev change semantics, I think the proper course of action is simply to not watch changes.
-- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband
Attachment:
PGP.sig
Description: This is a digitally signed message part