Re: _delete_some new onodes has appeared since PG removal started

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here's a tracker: https://tracker.ceph.com/issues/50466

bluefs_buffered_io is indeed enabled on this cluster, but I suspect it
doesn't help for this precise issue because the collection isn't
repeated fully listed any more.

-- dan

On Wed, Apr 21, 2021 at 4:22 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
>
> Hi Dan,
>
> I recall no relevant tracker, feel free to create.
>
> Curious if you had bluefs_buffered_io set to true when faced that?
>
>
> Thanks,
>
> Igor
>
> On 4/21/2021 4:37 PM, Dan van der Ster wrote:
> > Do we have a tracker for this?
> >
> > We should ideally be able to remove that final collection_list from
> > the optimized pg removal.
> > It can take a really long time and lead to osd flapping:
> >
> > 2021-04-21 15:23:37.003 7f51c273c700  1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> > 2021-04-21 15:23:41.595 7f51a3e81700  0
> > bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> > observed for _collection_list, latency = 67.7234s, lat = 67s cid
> > =10.14aes4_head start GHMAX end GHMAX max 30
> > 2021-04-21 15:23:41.595 7f51a3e81700  0 osd.941 pg_epoch: 43004
> > pg[10.14aes4( v 42754'296580 (40999'293500,42754'296580] lb MIN
> > (bitwise) local-lis/les=41331/41332 n=159058 ec=4951/4937 lis/c
> > 41331/41331 les/c/f 41332/42758/0 41330/42759/33461)
> > [171,903,106,27,395,773]p171(0) r=-1 lpr=42759 DELETING
> > pi=[41331,42759)/1 crt=42754'296580 unknown NOTIFY mbc={}]
> > _delete_some additional unexpected onode list (new onodes has appeared
> > since PG removal started[4#10:75280000::::head#]
> > 2021-04-21 15:23:50.061 7f51a3e81700  0
> > bluestore(/var/lib/ceph/osd/ceph-941) log_latency slow operation
> > observed for submit_transact, latency = 8.46584s
> > 2021-04-21 15:23:50.062 7f51a3e81700  1 heartbeat_map reset_timeout
> > 'OSD::osd_op_tp thread 0x7f51a3e81700' had timed out after 15
> > 2021-04-21 15:23:50.115 7f51b6ca1700  0
> > bluestore(/var/lib/ceph/osd/ceph-941) log_latency_fn slow operation
> > observed for _txc_committed_kv, latency = 8.51916s, txc =
> > 0x5573928a7340
> > 2021-04-21 15:23:50.473 7f51b2498700  0 log_channel(cluster) log [WRN]
> > : Monitor daemon marked osd.941 down, but it is still running
> >
> > -- dan
> >
> > On Thu, Apr 15, 2021 at 10:32 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >> Thanks Igor and Neha for the quick responses.
> >>
> >> I posted an osd log with debug_osd 10 and debug_bluestore 20:
> >> ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c
> >>
> >> Hope that helps,
> >>
> >> Dan
> >>
> >> On Thu, Apr 15, 2021 at 1:27 AM Neha Ojha <nojha@xxxxxxxxxx> wrote:
> >>> We saw this warning once in testing
> >>> (https://tracker.ceph.com/issues/49900#note-1), but there, the problem
> >>> was different, which also led to a crash. That issue has been fixed
> >>> but if you can provide osd logs with verbose logging, we might be able
> >>> to investigate further.
> >>>
> >>> Neha
> >>>
> >>> On Wed, Apr 14, 2021 at 4:14 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >>>> Hi Dan,
> >>>>
> >>>> Seen that once before and haven't thoroughly investigated yet but I
> >>>> think the new PG removal stuff just revealed this "issue". In fact it
> >>>> had been in the code before the patch.
> >>>>
> >>>> The warning means that new object(s) (given the object names these are
> >>>> apparently system objects, don't remember what's this exactly)  has been
> >>>> written to a PG after it was staged for removal.
> >>>>
> >>>> New PG removal properly handles that case - that was just a paranoid
> >>>> check for an unexpected situation which has actually triggered. Hence
> >>>> IMO no need to worry at this point but developers might want to validate
> >>>> why this is happening....
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Igor
> >>>>
> >>>> On 4/14/2021 10:26 PM, Dan van der Ster wrote:
> >>>>> Hi Igor,
> >>>>>
> >>>>> After updating to 14.2.19 and then moving some PGs around we have a
> >>>>> few warnings related to the new efficient PG removal code, e.g. [1].
> >>>>> Is that something to worry about?
> >>>>>
> >>>>> Best Regards,
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>> /var/log/ceph/ceph-osd.792.log:2021-04-14 20:34:34.353 7fb2439d4700  0
> >>>>> osd.792 pg_epoch: 40906 pg[10.14b2s0( v 40734'290069
> >>>>> (33782'287000,40734'290069] lb MIN (bitwise) local-lis/les=33990/33991
> >>>>> n=36272 ec=4951/4937 lis/c 33990/33716 les/c/f 33991/33747/0
> >>>>> 40813/40813/37166) [933,626,260,804,503,491]p933(0) r=-1 lpr=40813
> >>>>> DELETING pi=[33716,40813)/4 crt=40734'290069 unknown NOTIFY mbc={}]
> >>>>> _delete_some additional unexpected onode list (new onodes has appeared
> >>>>> since PG removal started[0#10:4d280000::::head#]
> >>>>>
> >>>>> /var/log/ceph/ceph-osd.851.log:2021-04-14 18:40:13.312 7fd87bded700  0
> >>>>> osd.851 pg_epoch: 40671 pg[10.133fs5( v 40662'288967
> >>>>> (33782'285900,40662'288967] lb MIN (bitwise) local-lis/les=33786/33787
> >>>>> n=13 ec=4947/4937 lis/c 40498/33714 les/c/f 40499/33747/0
> >>>>> 40670/40670/33432) [859,199,913,329,439,79]p859(0) r=-1 lpr=40670
> >>>>> DELETING pi=[33714,40670)/4 crt=40662'288967 unknown NOTIFY mbc={}]
> >>>>> _delete_some additional unexpected onode list (new onodes has appeared
> >>>>> since PG removal started[5#10:fcc80000::::head#]
> >>>>>
> >>>>> /var/log/ceph/ceph-osd.851.log:2021-04-14 20:58:14.393 7fd87adeb700  0
> >>>>> osd.851 pg_epoch: 40906 pg[10.2e8s3( v 40610'288991
> >>>>> (33782'285900,40610'288991] lb MIN (bitwise) local-lis/les=33786/33787
> >>>>> n=161220 ec=4937/4937 lis/c 39826/33716 les/c/f 39827/33747/0
> >>>>> 40617/40617/39225) [717,933,727,792,607,129]p717(0) r=-1 lpr=40617
> >>>>> DELETING pi=[33716,40617)/3 crt=40610'288991 unknown NOTIFY mbc={}]
> >>>>> _delete_some additional unexpected onode list (new onodes has appeared
> >>>>> since PG removal started[3#10:17400000::::head#]
> >>>>>
> >>>>> /var/log/ceph/ceph-osd.883.log:2021-04-14 18:55:16.822 7f78c485d700  0
> >>>>> osd.883 pg_epoch: 40857 pg[7.d4( v 40804'9911289
> >>>>> (35835'9908201,40804'9911289] lb MIN (bitwise)
> >>>>> local-lis/les=40782/40783 n=195 ec=2063/1989 lis/c 40782/40782 les/c/f
> >>>>> 40783/40844/0 40781/40845/40845) [877,870,894] r=-1 lpr=40845 DELETING
> >>>>> pi=[40782,40845)/1 crt=40804'9911289 lcod 40804'9911288 unknown NOTIFY
> >>>>> mbc={}] _delete_some additional unexpected onode list (new onodes has
> >>>>> appeared since PG removal started[#7:2b000000::::head#]
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux