Re: snaptrim blocks IO on ceph nautilus

Manuel Lausch <manuel.lausch@xxxxxxxx> · Mon, 3 Aug 2020 10:31:49 +0200

Hi,

the problem still exists and I don't know whats the reason and how to
fix it.

I figured out, that only about 20 OSDs was affected. After I did a ceph
daemon osd.<id> compact on this the problem was gone.

I compacted all OSDs in the hope my issue will be fixed with this. But
over the weekend I run into the same issue.

Does anyone have/had the same issue and has a solution for this?

Regards
Manuel

On Mon, 27 Jul 2020 15:59:01 +0200
Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

> Hi,
> 
> since some days I try to debug a problem with snaptrimming under
> nautilus.
> 
> I have a cluster with Nautilus (v14.2.10) , 44 Nodes á 24 OSDs á 14 TB
> I create every day a snapshot for 7 days.
> 
> Every time the old snapshot is deleting I have bad IO performcance
> and blocked requests for several seconds until the snaptrim is done.
> Settings like snaptrim_sleep and osd_pg_max_concurrent_snap_trims
> don't affect this behavior.
> 
> In the debug_osd 10/10 log I see the following:
> 
> 2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op
> 0x557886edda20 prio 196 cost 0 latency 0.019545
> osd_repop_reply(client.22731418.0:615257 3.636 e22457/22372) v2 pg
> pg[3.636( v 22457'100855 (21737'97756,22457'100855]
> local-lis/les=22372/22374 n=27762 ec=2842/2839 lis/c 22372/22372
> les/c/f 22374/22374/0 22372/22372/22343) [411,36,956,763] r=0
> lpr=22372 luod=22457'100854 crt=22457'100855 lcod 22457'100853 mlcod
> 22457'100853 active+clean+snaptrim_wait trimq=[1d~1]] 2020-07-27
> 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20
> finish 2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457
> dequeue_op 0x557886edc2c0 prio 127 cost 0 latency 0.043165
> MOSDScrubReserve(2.2645 RELEASE e22457) v1 pg pg[2.2645( empty
> local-lis/les=22359/22364 n=0 ec=2403/2403 lis/c 22359/22359 les/c/f
> 22364/22367/0 22359/22359/22359) [379,411,884,975] r=1 lpr=22359
> crt=0'0 active mbc={}] 2020-07-27 11:45:49.976 7fd8b8404700 10
> osd.411 22457 dequeue_op 0x557886edc2c0 finish 2020-07-27
> 11:45:50.039 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v
> 22457'99491 (21594'96426,22457'99491] local-lis/les=22359/22362
> n=27669 ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0
> 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 crt=22457'99491
> lcod 22457'99489 mlcod 22457'99489 active+clean+snaptrim
> trimq=[1d~1]] snap_trimmer posting 2020-07-27 11:45:57.801
> 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 22457'99493
> (21594'96426,22457'99493] local-lis/les=22359/22362 n=27669
> ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0
> 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 luod=22457'99491
> crt=22457'99493 lcod 22457'99489 mlcod 22457'99489
> active+clean+snaptrim trimq=[1d~1]] snap_trimmer complete 2020-07-27
> 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760
> prio 127 cost 663 latency 7.761823 osd_repop(osd.217.0:3025 3.1ca5
> e22457/22378) v2 pg pg[3.1ca5( v 22457'100370
> (21716'97357,22457'100370] local-lis/les=22378/22379 n=27532
> ec=2855/2839 lis/c 22378/22378 les/c/f 22379/22379/0
> 22378/22378/22378) [217,411,551,1055] r=1 lpr=22378 luod=0'0
> lua=22294'100006 crt=22457'100370 lcod 22457'100369 active mbc={}]
> 2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op
> 0x557880ac3760 finish 2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411
> 22457 dequeue_op 0x5578813e1e40 prio 127 cost 0 latency 7.494296
> MOSDScrubReserve(2.37e2 REQUEST e22457) v1 pg pg[2.37e2( empty
> local-lis/les=22355/22356 n=0 ec=2412/2412 lis/c 22355/22355 les/c/f
> 22356/22356/0 22355/22355/22355) [245,411,834,768] r=1 lpr=22355
> crt=0'0 active mbc={}] 2020-07-27 11:45:57.801 7fd8b8404700 10
> osd.411 22457 dequeue_op 0x5578813e1e40 finish
> 
> the dequeueing of ops works without pauses until the „snap_trimmer
> posting“ and „snap_trimmer complete“ loglines. This task takes in
> this example about 7 Seconds. The following operations which are
> dequeued have now a latency of about this time.
> 
> I tried to drill down this in the code. (Developers are asked here)
> It seems, that the PG will be locked for every operation.
> The snap_trimmer posting and complete message comes from
> „osd/PrimaryLogPG.cc“ on line 4700. This indicates me, that the
> process of deleting a snapshot object will sometimes take some time.
> 
> After further poking around. I see in „osd/SnapMapper.cc“ the method
> „SnapMapper::get_next_objects_to_trim“ which takes several seconds to
> get finished. I followed this further to the „common/map_cacher.hpp“
> to the line 94: „int r = driver->get_next(key, &store);“ From there I
> lost the path.
> 
> The slowness is not on all OSDs at the same time. Somteime, this few
> OSDs are affected, sometimes some others. Restart of an OSD does not
> help.
> 
> With luminous and filestore, snapshot deletion was not an issue at
> all. With nautilus and bluestore this is not acceptable for my
> usecase.
> 
> I don‘t know so far, if this is a bluestore specific problem or some
> general issue. I wonder a bit why there are no other who have this
> problem.
> 
> 
> Regards
> Manuel
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx