31.01.2017 01:19, NeilBrown пишет: > On Mon, Jan 30 2017, Andrei Borzenkov wrote: > >> On Mon, Jan 30, 2017 at 9:36 AM, NeilBrown <neilb@xxxxxxxx> wrote: >> ... >>>>>>> >>>>>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. >>>>>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. >>>>>>> systemd[1]: Starting Activate md array even though degraded... >>>>>>> systemd[1]: Stopped target Local File Systems. >>>>>>> systemd[1]: Stopping Local File Systems. >>>>>>> systemd[1]: Unmounting /share... >>>>>>> systemd[1]: Stopped (with error) /dev/md0. >>>>> >> ... >>> >>> The race is, I think, that one I mentioned. If the md device is started >>> before udev tells systemd to start the timer, the Conflicts dependencies >>> goes the "wrong" way and stops the wrong thing. >>> >> >> From the logs provided it is unclear whether it is *timer* or >> *service*. If it is timer - I do not understand why it is started >> exactly 30 seconds after device apparently appears. This would match >> starting service. > > My guess is that the timer is triggered immediately after the device is > started, but before it is mounted. > The Conflicts directive tries to stop the device, but is cannot stop the > device and there are no dependencies yet, so nothing happen. > After the timer fires (30 seconds later) the .service starts. It also > has a Conflicts directory so systemd tried to stop the device again. > Now that it has been mounted, there is a dependences that can be > stopped, and the device gets unmounted. > >> >> Yet another case where system logging is hopelessly unfriendly for >> troubleshooting :( >> >>> It would be nice to be able to reliably stop the timer when the device >>> starts, without risking having the device get stopped when the timer >>> starts, but I don't think we can reliably do that. >>> >> >> Well, let's wait until we can get some more information about what happens. >> Not much more, but we at least have confirmed that it was indeed last resort service which was fired off by last resort timer. Unfortunately no trace of timer itself. >>> Changing the >>> Conflicts=sys-devices-virtual-block-%i.device >>> lines to >>> ConditionPathExists=/sys/devices/virtual/block/%i >>> might make the problem go away, without any negative consequences. >>> >> >> Ugly, but yes, may be this is the only way using current systemd. >> This won't work. sysfs node appears as soon as the very first array member is found and array is still inactive, while what we need is condition "array is active". Conflicts line works because array is not announced to systemd (SYSTEMD_READY) until it is active. Which in turn is derived from the content of md/array_state. >>> The primary purpose of having the 'Conflicts' directives was so that >>> systemd wouldn't log >>> Starting Activate md array even though degraded >>> after the array was successfully started. >> Yes, I understand it. >> This looks like cosmetic problem. What will happen if last resort >> service is started when array is fully assembled? Will it do any harm? > > Yes, it could be seen as cosmetic, but cosmetic issues can be important > too. Confusing messages in logs can be harmful. > > In all likely cases, running the last-resort service won't cause any > harm. > If, during the 30 seconds, the array is started, then deliberately > stopped, then partially assembled again, then when the last-resort > service finally starts it might do the wrong thing. > So it would be cleanest if the timer was killed as soon as the device > is started. But I don't think there is a practical concern. > > I guess I could make a udev rule that fires when the array started, and > that runs "systemctl stop mdadm-last-resort@md0.timer" > Well ... what we really need is unidirectional dependency. Actually the way Conflicts is used *is* unidirectional anyway - nobody seriously expects that starting foo.service will stop currently running shutdown.target. But that is semantic we have currently. But this probably will do to mitigate this issue until something more generic can be implemented.
Attachment:
signature.asc
Description: OpenPGP digital signature