On Tue, Nov 29, 2016 at 11:29 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Tue, Nov 29, 2016 at 4:03 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >> >>> On 29 Nov 2016, at 22:58, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>> >>> On Tue, Nov 29, 2016 at 2:59 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >>>> >>>>> On 29 Nov 2016, at 21:54, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>>>> >>>>> On Tue, Nov 29, 2016 at 12:49 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >>>>>> Current code does not wake up sync waiter if osd replies error >>>>>> or the request does not want ondisk ack. >>>>> >>>>> Hi Zheng, >>>>> >>>>> Did you mean to write "and" instead of "or" - "if osd replies error >>>>> *and* the request does not want ondisk ack”? >>>> >>>> I mean ‘or’. The code does not wake up sync waiter in either of these two cases. >>> >>> It does wake the sync waiter (r_safe_completion) on error, but only >>> if ondisk ack was requested. If ondisk ack wasn't requested, it's not >>> woken up, success or error. >>> >>> By not setting CEPH_OSD_FLAG_ONDISK, the submitter is indicating that >>> *any* reply is fine. Blocking on the safe reply instead of any reply >>> in the !CEPH_OSD_FLAG_ONDISK case sounds wrong. >>> >>> What is the motivation for this patch? >> >> I saw fsstress hang at ceph_osdc_sync several times. >> >> [52310.533483] sysrq: SysRq : Show Blocked State >> [52310.534057] task PC stack pid father >> [52310.534653] fsstress D14656 2836 2834 0x00000000 >> [52310.535568] ffff880036cff200 ffff88003e2dac80 0000000000002fa3 ffff88003e3142c0 >> [52310.536529] ffff88003fd18318 ffffc900038b7d48 ffffffff819354cc ffffffff00000000 >> [52310.537464] 000000013e314ae0 ffff88003e314890 ffff88003fd18318 ffff88003fd18300 >> [52310.538396] Call Trace: >> [52310.538762] [<ffffffff819354cc>] ? __schedule+0x6ac/0x880 >> [52310.539326] [<ffffffff81935713>] schedule+0x73/0x90 >> [52310.539840] [<ffffffff81939d21>] schedule_timeout+0x31/0x5d0 >> [52310.540425] [<ffffffff810ac4a6>] ? mark_held_locks+0x76/0x90 >> [52310.540980] [<ffffffff8193b447>] ? _raw_spin_unlock_irq+0x27/0x50 >> [52310.541582] [<ffffffff810ac64d>] ? trace_hardirqs_on_caller+0x18d/0x1c0 >> [52310.542217] [<ffffffff81936a1b>] wait_for_completion+0x8b/0x100 >> [52310.542788] [<ffffffff8108dc10>] ? wake_up_q+0x70/0x70 >> [52310.543380] [<ffffffffa00116b8>] ceph_osdc_sync+0xc8/0x150 [libceph] >> [52310.543990] [<ffffffff811eaed0>] ? SyS_tee+0x3b0/0x3b0 >> [52310.544569] [<ffffffffa005c3ea>] ceph_sync_fs+0x3a/0xb0 [ceph] >> [52310.545158] [<ffffffff811eaeeb>] sync_fs_one_sb+0x1b/0x20 >> [52310.545699] [<ffffffff811b931e>] iterate_supers+0x7e/0xe0 >> [52310.546258] [<ffffffff811eb1e0>] sys_sync+0x50/0x80 >> [52310.546849] [<ffffffff8193bb6a>] entry_SYSCALL_64_fastpath+0x18/0xad > > Did you happen to capture /osdc? It's probably the pool perm check > thing osdc is complete empty. > > wr_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ACK; > osd_req_op_init(wr_req, 0, CEPH_OSD_OP_CREATE, CEPH_OSD_OP_FLAG_EXCL); > > but it would be good to confirm. > > Thanks, > > Ilya > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html