On Thu, Dec 17, 2015 at 3:10 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: > Hi Sage, > > On 17/12/2015 14:31, Sage Weil wrote: >> On Thu, 17 Dec 2015, Loic Dachary wrote: >>> Hi Ilya, >>> >>> This is another puzzling behavior (the log of all commands is at >>> http://tracker.ceph.com/issues/14094#note-4). in a nutshell, after a >>> series of sgdisk -i commands to examine various devices including >>> /dev/sdc1, the /dev/sdc1 file disappears (and I think it will showup >>> again although I don't have a definitive proof of this). >>> >>> It looks like a side effect of a previous partprobe command, the only >>> command I can think of that removes / re-adds devices. I thought calling >>> udevadm settle after running partprobe would be enough to ensure >>> partprobe completed (and since it takes as much as 2mn30 to return, I >>> would be shocked if it does not ;-). Yeah, IIRC partprobe goes through every slot in the partition table, trying to first remove and then add the partition back. But, I don't see any mention of partprobe in the log you referred to. Should udevadm settle for a few vd* devices be taking that much time? I'd investigate that regardless of the issue at hand. >>> >>> Any idea ? I desperately try to find a consistent behavior, something >>> reliable that we could use to say : "wait for the partition table to be >>> up to date in the kernel and all udev events generated by the partition >>> table update to complete". >> >> I wonder if the underlying issue is that we shouldn't be calling udevadm >> settle from something running from udev. Instead, of a udev-triggered >> run of ceph-disk does something that changes the partitions, it >> should just exit and let udevadm run ceph-disk again on the new >> devices...? > > Unless I missed something this is on CentOS 7 and ceph-disk is only called from udev as ceph-disk trigger which does nothing else but asynchronously delegate the work to systemd. Therefore there is no udevadm settle from within udev (which would deadlock and timeout every time... I hope ;-). That's a sure lockup, until one of them times out. How are you delegating to systemd? Is it to avoid long-running udev events? I'm probably missing something - udevadm settle wouldn't block on anything other than udev, so if you are shipping work off to somewhere else, udev can't be relied upon for waiting. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html