The problem is this: Creating the first snapshot: ---------------------------- - preloads -cow, -real devices and origin and snapshot targets - suspends the underlying lv (mirror in this case) without DM_SUSPEND_NOFLUSH_FLAG and with DM_SUSPEND_LOCKFS_FLAG. This waits for all bios to drain and calls a filesystem driver to bring it to consistent state. - swap table with origin targets - resumes the underlying lv, the snapshot target and the origin target Handing a mirror failure: ------------------------- - preload the new table with linear volume or a mirror with reduced number of legs or a mirror with new legs allocated according to the allocation policy - suspend the mirror with "noflush" flag, "noflush" causes that failing bios are queued in device mapper - swap table with the new one - resume the mirror, queued buis are dequeued and passed to the new device Now, the problem: ----------------- 1. If you say that these two operations are independednt, two processes will race with suspend and resume on the same device. Bad. 2. If you put lock around, it changes into deadlock possibility: if during bio draining or filesystem cleanup dm-raid1 suffers a failure, the failure can't be recovered. 3. If you are suspending without DM_SUSPEND_NOFLUSH_FLAG, DM_ENDIO_REQUEUE is not allowd and requests returned with DM_ENDIO_REQUEUE are returned with -EIO (see function dec_pending). So if mirror leg or log failure happens, dm-raid1 returns DM_ENDIO_REQUEUE and the I/O is incorrectly finished with -EIO. If you remove this DM_ENDIO_REQUEUE->-EIO logic from dec_pending, go to case 2 above (deadlock). As of the possibility "it is very improbable" --- I think there is one case where the probability may be more than minimal. If the user has a mounted filesystem and doesn't use it for long time, the disk may have failed (or be unplugged) and the system doesn't notice it because the disk isn't used. Now, if the user creates a snapshot of mirror and it starts cleaning up filesystem journal, it may be the point where the disk error is detected. But it can't be repaired. I think it isn't easy to fix (see those 3 points above), the only possible ways to fix it would be: - make the mirror self-sufficient (integrate md) or - attach dummy dm-linear (or snapshot-origin) passthrough target on the top of each mirror. If we do it, snapshot creation could suspend this dummy passthrough target and simultaneously dmeventd could suspend the underlying mirror and there would be no race or deadlock. Mikulas -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel