On Tue, Feb 8, 2011 at 2:07 PM, Lennart Poettering <lennart@xxxxxxxxxxxxxx> wrote: > On Tue, 08.02.11 13:52, Andrey Borzenkov (arvidjaar@xxxxxxx) wrote: > >> I am probably the wrong one to ask, but here is what happens when >> array is started (from udev perspective) > > [...] > >> After this event device goes "plugged" and SYSTEMD_WANTS (if any) are >> triggered. But at this point we have zero information about array to >> decide anything. > > [...] > >> At this point we know it is container, know that it has external >> metadata and know that we need external metadata handler (mdmon). But >> it is too late for systemd. > > Kay, do you know why this "change" event is used here? Any chance we can > get rid of it? > >> >> > >> >> Actually it can be implemented even without mdadm patches; apparently >> >> it is possible to suppress normal starting of mdmon by setting >> >> MDADM_NO_MDMON=1 >> > >> > A this point mdmon is simply broken: if glibc or mdmon itself (or any >> > lib it is using) is upgraded, then mdmon will keep referencing the old >> > .so or binary as long as it is running. This means that the fs these >> > files are on cannot be remounted r/o. However mdmon insists on being >> > shutdown only after all fs got remounted ro. So you have a cyclic >> > ordering loop here: mdmon wants to be shut down after the remount, but >> > we need to shut it down before the remount. >> > >> >> Ehh ... >> >> a) mdmon is perfectly capable of restarting, it is already used to >> take over mdmon launched in initrd. The problem is to know when to >> restart - i.e. when respective libraries are changed. This is a job >> for package management in distribution. It is already employed for >> glibc, systemd and some others and can just as well be employed for >> mdmon. And this is totally unrelated to systemd :) > > Really, you are sying there is a synchronous way to make mdmon reexec > itself? How does that work? > I am not sure whether it qualifies as synchronous, but "mdmon --takeover" will kill any existing mdmon for this and start monitoring itself. >> b) having binary launched off some fs should not prevent this fs to be >> remountd ro - binaries are not opened rw > > If you run a binary and then the package manager replaces it then the > running instance will still refer to the old copy and this will have the > effect that the file isn't actually deleted until the proces > exits/execs. And because that is the way it is the kernel will refuse > unmounting of the fs until you terminated/reexeced your process. > >> > This is unfixable unless a) mdmon learns reexecution of itself without >> > losing state (like most init systems so), or b) mdmon would stop >> > insisting on being shutdown only after the remount. >> >> As far as I can tell, both is true today; but remounting is not >> enough, unfortunately. > > So, you are saying we can shut down mdmon without ill effects early? > At least that's what I see. You can shutdown mdmon and continue to work with file system, even if it is mounted rw. Under some conditions mount will hang; i.e. start array kill mdmon try to mount mount will hang. If you start mdmon, it is mounted. But if you now umount kill mdmon mount it is mounted just fine. >> > In my eyes b) is very much preferebale: It should be possible to shut >> > down mdmon like any other service. And if then some md related code >> > still needs to be run on late shutdown this should be done from a new >> > process. I would be willing to add some hooks for this, so that we can >> > execute arbitrary drop-in processes as part of the final shutdown loop. >> >> mdmon is needed to ensure metadata were correctly updated. So it needs >> to exist as long as metadata *may* be updated. For practical purposes >> it means - until file system is unmounted and flushed to disks. I am >> not sure that remounting ro stops all activity (at least, mounting ro >> definitely *writes* to device using some filesystems). > > Well, the root file systems cannot be unmounted, only remounted. > > So, is there a way to invoke mdmon so that it flushes all metadata > changes to disk and immediately terminates then this should be all we > need for a clean solution. We'd then shutdown the normal instances of > mdmon down like any other daemon and simply invoke this metadata > flushing command as part of late shutdown. Hmm ... it looks like you just need to start mdmon do mdadm --wait-clean After this you can kill mdmon again (assuming decide is no more in use). -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html