Hi Milind, Here's are the output of top and a pstack backtrace: https://gist.github.com/olliRJL/5f483c6bc4ad50178c8c9871370b26d3 https://gist.github.com/olliRJL/b83a743eca098c05d244e5c1def9046c I uploaded the debug log using ceph-post-file - hope someone can access that :) ceph-post-file: 30f9b38b-a62c-44bb-9e00-53edf483a415 Tnx! --------------------------- Olli Rajala - Lead TD Anima Vitae Ltd. www.anima.fi --------------------------- On Mon, Nov 7, 2022 at 2:30 PM Milind Changire <mchangir@xxxxxxxxxx> wrote: > maybe, > > - use the top program to look at a threaded listing of the ceph-mds > process and see which thread(s) are consuming the most cpu > - use gstack to attach to the ceph-mds process and dump the backtrace > into a file; we can then map the thread with highest cpu consumption to the > gstack output > - enable debug logs (level 20) for the ceph-mds process for a few > seconds and look at what's happening in there or share the logs with the > team here > > But I wonder if you could do this on your production system. > > > > On Mon, Nov 7, 2022 at 4:34 PM Olli Rajala <olli.rajala@xxxxxxxx> wrote: > >> I might have spoken too soon :( >> >> Now about 60h after dropping the caches the write bandwidth has gone up >> linearly from those initial hundreds of kB/s to now nearly 10MB/s. >> >> I don't think this could be caused by the cache just filling up again >> either. After dropping the cache I tested if filling up the cache would >> show any bw increase by running "tree" at the root of one of the mounts >> and >> it didn't affect anything at the time. So basically the cache has been >> fully saturated all this time now. >> >> Boggled, >> --------------------------- >> Olli Rajala - Lead TD >> Anima Vitae Ltd. >> www.anima.fi >> --------------------------- >> >> >> On Sat, Nov 5, 2022 at 12:47 PM Olli Rajala <olli.rajala@xxxxxxxx> wrote: >> >> > Oh Lordy, >> > >> > Seems like I finally got this resolved. And all it needed in the end was >> > to drop the mds caches with: >> > ceph tell mds.`hostname` cache drop >> > >> > The funny thing is that whatever the issue with the cache was it had >> > persisted through several Ceph upgrades and node reboots. It's been a >> live >> > production system so I guess that there just never has been a moment >> where >> > all mds would have been down and thus make it fully rebuild the cache... >> > maybe :| >> > >> > Unfortunately I don't remember when this issue arose and my metrics >> don't >> > reach far back enough... but I wonder if this could have started already >> > when I did Octopus->Pacific upgrade... >> > >> > Cheers, >> > --------------------------- >> > Olli Rajala - Lead TD >> > Anima Vitae Ltd. >> > www.anima.fi >> > --------------------------- >> > >> > >> > On Mon, Oct 24, 2022 at 9:36 PM Olli Rajala <olli.rajala@xxxxxxxx> >> wrote: >> > >> >> I tried my luck and upgraded to 17.2.4 but unfortunately that didn't >> make >> >> any difference here either. >> >> >> >> I also looked more again at all kinds of client op and request stats >> and >> >> wotnot which only made me even more certain that this io is not caused >> by >> >> any clients. >> >> >> >> What internal mds operation or mechanism could cause such high idle >> write >> >> io? I've tried to fiddle a bit with some of the mds cache trim and >> memory >> >> settings but I haven't noticed any effect there. Any pointers >> appreciated. >> >> >> >> Cheers, >> >> --------------------------- >> >> Olli Rajala - Lead TD >> >> Anima Vitae Ltd. >> >> www.anima.fi >> >> --------------------------- >> >> >> >> >> >> On Mon, Oct 17, 2022 at 10:28 AM Olli Rajala <olli.rajala@xxxxxxxx> >> >> wrote: >> >> >> >>> Hi Patrick, >> >>> >> >>> With "objecter_ops" did you mean "ceph tell mds.pve-core-1 ops" and/or >> >>> "ceph tell mds.pve-core-1 objecter_requests"? Both these show very few >> >>> requests/ops - many times just returning empty lists. I'm pretty sure >> that >> >>> this I/O isn't generated by any clients - I've earlier tried to >> isolate >> >>> this by shutting down all cephfs clients and this didn't have any >> >>> noticeable effect. >> >>> >> >>> I tried to watch what is going on with that "perf dump" but to be >> honest >> >>> all I can see is some numbers going up in the different sections :) >> >>> ...don't have a clue what to focus on and how to interpret that. >> >>> >> >>> Here's a perf dump if you or anyone could make something out of that: >> >>> https://gist.github.com/olliRJL/43c10173aafd82be22c080a9cd28e673 >> >>> >> >>> Tnx! >> >>> o. >> >>> >> >>> --------------------------- >> >>> Olli Rajala - Lead TD >> >>> Anima Vitae Ltd. >> >>> www.anima.fi >> >>> --------------------------- >> >>> >> >>> >> >>> On Fri, Oct 14, 2022 at 8:32 PM Patrick Donnelly <pdonnell@xxxxxxxxxx >> > >> >>> wrote: >> >>> >> >>>> Hello Olli, >> >>>> >> >>>> On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala <olli.rajala@xxxxxxxx> >> >>>> wrote: >> >>>> > >> >>>> > Hi, >> >>>> > >> >>>> > I'm seeing constant 25-50MB/s writes to the metadata pool even when >> >>>> all >> >>>> > clients and the cluster is idling and in clean state. This surely >> >>>> can't be >> >>>> > normal? >> >>>> > >> >>>> > There's no apparent issues with the performance of the cluster but >> >>>> this >> >>>> > write rate seems excessive and I don't know where to look for the >> >>>> culprit. >> >>>> > >> >>>> > The setup is Ceph 16.2.9 running in hyperconverged 3 node core >> >>>> cluster and >> >>>> > 6 hdd osd nodes. >> >>>> > >> >>>> > Here's typical status when pretty much all clients are idling. Most >> >>>> of that >> >>>> > write bandwidth and maybe fifth of the write iops is hitting the >> >>>> > metadata pool. >> >>>> > >> >>>> > >> >>>> >> --------------------------------------------------------------------------------------------------- >> >>>> > root@pve-core-1:~# ceph -s >> >>>> > cluster: >> >>>> > id: 2088b4b1-8de1-44d4-956e-aa3d3afff77f >> >>>> > health: HEALTH_OK >> >>>> > >> >>>> > services: >> >>>> > mon: 3 daemons, quorum pve-core-1,pve-core-2,pve-core-3 (age >> 2w) >> >>>> > mgr: pve-core-1(active, since 4w), standbys: pve-core-2, >> >>>> pve-core-3 >> >>>> > mds: 1/1 daemons up, 2 standby >> >>>> > osd: 48 osds: 48 up (since 5h), 48 in (since 4M) >> >>>> > >> >>>> > data: >> >>>> > volumes: 1/1 healthy >> >>>> > pools: 10 pools, 625 pgs >> >>>> > objects: 70.06M objects, 46 TiB >> >>>> > usage: 95 TiB used, 182 TiB / 278 TiB avail >> >>>> > pgs: 625 active+clean >> >>>> > >> >>>> > io: >> >>>> > client: 45 KiB/s rd, 38 MiB/s wr, 6 op/s rd, 287 op/s wr >> >>>> > >> >>>> >> --------------------------------------------------------------------------------------------------- >> >>>> > >> >>>> > Here's some daemonperf dump: >> >>>> > >> >>>> > >> >>>> >> --------------------------------------------------------------------------------------------------- >> >>>> > root@pve-core-1:~# ceph daemonperf mds.`hostname -s` >> >>>> > >> >>>> >> ----------------------------------------mds----------------------------------------- >> >>>> > --mds_cache--- ------mds_log------ -mds_mem- >> -------mds_server------- >> >>>> mds_ >> >>>> > -----objecter------ purg >> >>>> > req rlat fwd inos caps exi imi hifc crev cgra ctru cfsa cfa >> hcc >> >>>> hccd >> >>>> > hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr hcs >> hsr >> >>>> cre >> >>>> > cat |sess|actv rd wr rdwr|purg| >> >>>> > 40 0 0 767k 78k 0 0 0 1 6 1 0 0 >> 5 >> >>>> 5 >> >>>> > 3 7 |1.1k 0 0 | 17 3.7k 134 0 |767k 767k| 40 5 0 >> >>>> 0 >> >>>> > 0 |110 | 4 2 21 0 | 2 >> >>>> > 57 2 0 767k 78k 0 0 0 3 16 3 0 0 >> 11 >> >>>> 11 >> >>>> > 0 17 |1.1k 0 0 | 45 3.7k 137 0 |767k 767k| 57 8 0 >> >>>> 0 >> >>>> > 0 |110 | 0 2 28 0 | 4 >> >>>> > 57 4 0 767k 78k 0 0 0 4 34 4 0 0 >> 34 >> >>>> 33 >> >>>> > 2 26 |1.0k 0 0 |134 3.9k 139 0 |767k 767k| 57 13 0 >> >>>> 0 >> >>>> > 0 |110 | 0 2 112 0 | 19 >> >>>> > 67 3 0 767k 78k 0 0 0 6 32 6 0 0 >> 22 >> >>>> 22 >> >>>> > 0 32 |1.1k 0 0 | 78 3.9k 141 0 |767k 768k| 67 4 0 >> >>>> 0 >> >>>> > 0 |110 | 0 2 56 0 | 2 >> >>>> > >> >>>> >> --------------------------------------------------------------------------------------------------- >> >>>> > Any ideas where to look at? >> >>>> >> >>>> Check the perf dump output of the mds: >> >>>> >> >>>> ceph tell mds.<fs_name>:0 perf dump >> >>>> >> >>>> over a period of time to identify what's going on. You can also look >> >>>> at the objecter_ops (another tell command) for the MDS. >> >>>> >> >>>> -- >> >>>> Patrick Donnelly, Ph.D. >> >>>> He / Him / His >> >>>> Principal Software Engineer >> >>>> Red Hat, Inc. >> >>>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >> >>>> >> >>>> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> > > -- > Milind > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx