Sage, is looks logical, that if the user issues "umount -l" - the code should give up syncing and clear the state. Or possibly there should be a /proc/...whatever or /sys/...whatever setting to define a default timeout to give up syncing. 2010/7/24 Sébastien Paolacci <sebastien.paolacci@xxxxxxxxx>: > Hello Sage, > > I was just trying to relive an old thread but I definitely agree that > I didn't make my point clear enough, sorry for that. > > The global idea is that whatever happen server-side, the client should > be able to be left in a clean state. By clean I mean that, except data > explicitly pushed to (pulled from) the tested ceph share, no other > side effect from the test session should be visible. > > The real issue with hanged unmounts is obviously not with the console > been frozen but with all the subsequent syncs that are going to follow > the same path (and syncs do happen in a real life scenarios, e.g. when > softly halting/restarting a box). > > Explicitly aborting the sync (whatever the way) is indeed a seductive > option that would almost solve the point without going so far from a > sync decent safe behavior. > > As a matter of convenience, should I just have a few hundred nodes to > restart, I would however expect the sync to automatically abort > because a delay I take the responsibility for as expired and the > kclient is still deeply confident with the ceph tragic dead. > > So let's go back to a concrete failure case that can bother a client box ;) : > - a fresh new and just formated ceph instance is started. > - the share is mounted on a separate box and one single file is > created (touch /mnt/test). > - ceph daemons are hardly killed (pkill -9 on cosd, cmds, cmon) and > the share is unmonted. > > The umount hang "as expected", but If I wait long enough I'll eventually get a > > Jul 24 09:31:16: [ 1163.642060] ceph: loaded (mon/mds/osd proto > 15/32/24, osdmap 5/5 5/5) > Jul 24 09:31:16: [ 1163.646098] ceph: client4099 fsid > b003239e-a249-7c47-f7ca-a9b75da2a445 > Jul 24 09:31:16: [ 1163.646353] ceph: mon0 192.168.0.3:6789 session established > Jul 24 09:32:05: [ 1213.290150] ceph: mon0 192.168.0.3:6789 session > lost, hunting for new mon > Jul 24 09:33:01: [ 1269.227827] ceph: mds0 caps stale > Jul 24 09:33:16: [ 1284.219034] ceph: mds0 caps stale > Jul 24 09:35:52: [ 1439.844419] umount D 0000000000000000 0 > 2819 2788 0x00000000 > Jul 24 09:35:52: [ 1439.844425] ffff880127a5b880 0000000000000086 > 0000000000000000 0000000000015640 > Jul 24 09:35:52: [ 1439.844430] 0000000000015640 0000000000015640 > 000000000000f8a0 ffff880124ef1fd8 > Jul 24 09:35:52: [ 1439.844435] 0000000000015640 0000000000015640 > ffff880086c8b170 ffff880086c8b468 > Jul 24 09:35:52: [ 1439.844439] Call Trace: > Jul 24 09:35:52: [ 1439.844455] [<ffffffffa051b740>] ? > ceph_mdsc_sync+0x1be/0x1da [ceph] > Jul 24 09:35:52: [ 1439.844462] [<ffffffff81064afa>] ? > autoremove_wake_function+0x0/0x2e > Jul 24 09:35:52: [ 1439.844473] [<ffffffffa05210ac>] ? > ceph_osdc_sync+0x1d/0xc1 [ceph] > Jul 24 09:35:52: [ 1439.844479] [<ffffffffa050931f>] ? > ceph_syncfs+0x2a/0x2e [ceph] > Jul 24 09:35:52: [ 1439.844485] [<ffffffff8110b065>] ? > __sync_filesystem+0x5f/0x70 > Jul 24 09:35:52: [ 1439.844489] [<ffffffff8110b1de>] ? > sync_filesystem+0x2e/0x44 > Jul 24 09:35:52: [ 1439.844494] [<ffffffff810efdfa>] ? > generic_shutdown_super+0x21/0xfa > Jul 24 09:35:52: [ 1439.844498] [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40 > Jul 24 09:35:52: [ 1439.844505] [<ffffffffa05082ab>] ? > ceph_kill_sb+0x24/0x47 [ceph] > Jul 24 09:35:52: [ 1439.844509] [<ffffffff810f05c5>] ? > deactivate_super+0x60/0x77 > Jul 24 09:35:52: [ 1439.844514] [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2 > Jul 24 09:35:52: [ 1439.844521] [<ffffffff81010b42>] ? > system_call_fastpath+0x16/0x1b > Jul 24 09:37:06: [ 1514.085107] ceph: mds0 hung > Jul 24 09:37:52: [ 1559.774508] umount D 0000000000000000 0 > 2819 2788 0x00000000 > Jul 24 09:37:52: [ 1559.774514] ffff880127a5b880 0000000000000086 > 0000000000000000 0000000000015640 > Jul 24 09:37:52: [ 1559.774519] 0000000000015640 0000000000015640 > 000000000000f8a0 ffff880124ef1fd8 > Jul 24 09:37:52: [ 1559.774524] 0000000000015640 0000000000015640 > ffff880086c8b170 ffff880086c8b468 > Jul 24 09:37:52: [ 1559.774528] Call Trace: > Jul 24 09:37:52: [ 1559.774545] [<ffffffffa051b740>] ? > ceph_mdsc_sync+0x1be/0x1da [ceph] > Jul 24 09:37:52: [ 1559.774552] [<ffffffff81064afa>] ? > autoremove_wake_function+0x0/0x2e > Jul 24 09:37:52: [ 1559.774562] [<ffffffffa05210ac>] ? > ceph_osdc_sync+0x1d/0xc1 [ceph] > Jul 24 09:37:52: [ 1559.774569] [<ffffffffa050931f>] ? > ceph_syncfs+0x2a/0x2e [ceph] > Jul 24 09:37:52: [ 1559.774574] [<ffffffff8110b065>] ? > __sync_filesystem+0x5f/0x70 > Jul 24 09:37:52: [ 1559.774578] [<ffffffff8110b1de>] ? > sync_filesystem+0x2e/0x44 > Jul 24 09:37:52: [ 1559.774584] [<ffffffff810efdfa>] ? > generic_shutdown_super+0x21/0xfa > Jul 24 09:37:52: [ 1559.774589] [<ffffffff810eff16>] ? kill_anon_super+0x9/0x40 > Jul 24 09:37:52: [ 1559.774595] [<ffffffffa05082ab>] ? > ceph_kill_sb+0x24/0x47 [ceph] > Jul 24 09:37:52: [ 1559.774600] [<ffffffff810f05c5>] ? > deactivate_super+0x60/0x77 > Jul 24 09:37:52: [ 1559.774604] [<ffffffff81102da3>] ? sys_umount+0x2c3/0x2f2 > Jul 24 09:37:52: [ 1559.774612] [<ffffffff81010b42>] ? > system_call_fastpath+0x16/0x1b > (... repeating forever ...) > > The box now as to be hardly powered off and a fsck will possibly > follow the restart... > > I'm not saying that this situation is not to be expected when testing > a not prod ready system, I'm just trying to emphasize that client > safety may actually be a blocking point for some more people to give a > try. > > Hope this clarifies, > Sebastien > > > 2010/7/23 Sage Weil <sage@xxxxxxxxxxxx>: >> On Fri, 23 Jul 2010, Sébastien Paolacci wrote: >>> Hello Sage, >>> >>> I would like to emphasize that this issue is somewhat annoying, even >>> for experiment purpose: I definitely expect my test server to not >>> behave safely, crash, burn or whatever, but having a client side >>> impact as deep as needed a (hard) reboot to solved a hanged ceph >>> really prevent me from testing with real life payloads. >> >> Maybe you can clarify for me exactly where the problem is. 'umount -f' >> should work. 'umount -l' should do a lazy unmount (detach from >> namespace), but the actual unmount code may currently hang. It's >> debateable how that can/should be solved, since it's the 'sync' stage that >> hangs, and it's not clear we should ever 'give up' on that without an >> administrator telling us to (*). >> >> What problem do you actually see, though? Why does it matter, or why do >> you care, if the 'umount -l' leaves some kernel threads trying to umount? >> Is it just annoying because it Shouldn't Do That, or does it actually >> cause a problem for you? >> >> It may be that if you try to remount the same fs, the old superblock gets >> reused, and the mount fails somehow... I haven't tried that. That would >> be an easy fix, though. >> >> Any clarification would be helpful! Thanks- >> sage >> >> >> * Maybe a hook like /sys/kernel/debug/ceph/.../abort_sync that you can >> echo 1 to would be sufficient to make it give up on a sync (in the umount >> -l case, the sync prior to the actual unmount). >> >> >>> >>> I understand that it's not an easy point but a lot of my colleagues >>> are not really whiling to sacrifice even their dev workstation to play >>> during spare time... sad world ;) >>> >>> Sebastien >>> >>> On Wed, 16 Jun 2010, Peter Niemayer wrote: >>> > Hi, >>> > >>> > trying to "umount" a formerly mounted ceph filesystem that has become >>> > unavailable (osd crashed, then msd/mon were shut down using /etc/init.d/ceph >>> > stop) results in "umount" hanging forever in >>> > "D" state. >>> > >>> > Strangely, "umount -f" started from another terminal reports >>> > the ceph filesystem as not being mounted anymore, which is consistent >>> > with what the mount-table says. >>> > >>> > The kernel keeps emitting the following messages from time to time: >>> > > Jun 16 17:25:29 gitega kernel: ceph: tid 211912 timed out on osd0, will >>> > > reset osd >>> > > Jun 16 17:25:35 gitega kernel: ceph: mon0 10.166.166.1:6789 connection >>> > > failed >>> > > Jun 16 17:26:15 gitega last message repeated 4 times >>> > >>> > I would have expected the "umount" to terminate at least after some generous >>> > timeout. >>> > >>> > Ceph should probably support something like the "soft,intr" options >>> > of NFS, because if the only supported way of mounting is one where >>> > a client is more or less stuck-until-reboot when the service fails, >>> > many potential test-configurations involving Ceph are way too dangerous >>> > to try... >>> >>> Yeah, being able to force it to shut down when servers are unresponsive is >>> definitely the intent. 'umount -f' should work. It sounds like the >>> problem is related to the initial 'umount' (which doesn't time out) >>> followed by 'umount -f'. >>> >>> I'm hesitant to add a blanket umount timeout, as that could prevent proper >>> writeout of cached data/metadata in some cases. So I think the goal >>> should be that if a normal umount hangs for some reason, you should be >>> able to intervene to add the 'force' if things don't go well. >>> >>> sage >>> -- >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html