This was common in the NFS days, and some Linux distribution deliberately slew the execution time. find over an NFS mount was a sure-fire way to horque the server. (e.g. Convex C1) IMHO since the tool relies on a static index it isn't very useful, and I routinely remove any variant from my systems. ymmv > On Jul 2, 2024, at 10:20, Olli Rajala <olli.rajala@xxxxxxxx> wrote: > > Hi - mostly as a note to future me and if anyone else looking for the same > issue... > > I finally solved this a couple of months ago. No idea what is wrong with > Ceph but the root cause that was triggering this MDS issue was that I had > several workstations and a couple servers where the updatedb of "locate" > was getting run by daily cron exactly the same time every night causing > high momentary strain on the MDS which then somehow screwed up the metadata > caching and flushing creating this cumulative write io. > > The thing to note here is that there's a difference with "locate" and > "mlocate" packages. The default config (on Ubuntu atleast) of updatedb for > "mlocate" does skip scanning cephfs filesystems but not so for "locate" > which happily ventures onto all of your cephfs mounts :| > > --------------------------- > Olli Rajala - Lead TD > Anima Vitae Ltd. > www.anima.fi > --------------------------- > > > On Wed, Dec 14, 2022 at 7:41 PM Olli Rajala <olli.rajala@xxxxxxxx> wrote: > >> Hi, >> >> One thing I now noticed in the mds logs is that there's a ton of entries >> like this: >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache projecting to >> [d345,d346] n(v1638 rc2022-12-11T18:20:49.317400+0200 b787972591 >> 694=484+210) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache result >> [d345,d346] n(v1638 rc2022-12-11T18:20:49.321400+0200 b787972591 >> 695=484+211) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache projecting to >> [d343,d344] n(v1638 rc2022-12-11T18:20:49.317400+0200 b787972591 >> 694=484+210) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache result >> [d343,d344] n(v1638 rc2022-12-11T18:20:49.321400+0200 b787972591 >> 695=484+211) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache projecting to >> [d341,d342] n(v1638 rc2022-12-11T18:20:49.317400+0200 b787972591 >> 694=484+210) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache result >> [d341,d342] n(v1638 rc2022-12-11T18:20:49.321400+0200 b787972591 >> 695=484+211) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache projecting to >> [d33f,d340] n(v1638 rc2022-12-11T18:20:49.317400+0200 b787972591 >> 694=484+210) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache result >> [d33f,d340] n(v1638 rc2022-12-11T18:20:49.321400+0200 b787972591 >> 695=484+211) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache projecting to >> [d33d,d33e] n(v1638 rc2022-12-11T18:20:49.317400+0200 b787972591 >> 694=484+210) >> 2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache result >> [d33d,d33e] n(v1638 rc2022-12-11T18:20:49.321400+0200 b787972591 >> 695=484+211) >> >> ...and after dropping the caches considerably less of those - normal, >> abnormal, typical, atypical? ...or is that just something that starts >> happening after the cache gets filled? >> >> Tnx, >> --------------------------- >> Olli Rajala - Lead TD >> Anima Vitae Ltd. >> www.anima.fi >> --------------------------- >> >> >> On Sun, Dec 11, 2022 at 9:07 PM Olli Rajala <olli.rajala@xxxxxxxx> wrote: >> >>> Hi, >>> >>> I'm still totally lost with this issue. And now lately I've had a couple >>> of incidents where the write bw has suddenly jumped to even crazier levels. >>> See the graph here: >>> https://gist.github.com/olliRJL/3e97e15a37e8e801a785a1bd5358120d >>> >>> The points where it drops to something manageable again are when I have >>> dropped the mds caches. Usually after the drop there is steady rise but now >>> these sudden jumps are something new and even more scary :E >>> >>> Here's a fresh 2sec level 20 mds log: >>> https://gist.github.com/olliRJL/074bec65787085e70db8af0ec35f8148 >>> >>> Any help and ideas greatly appreciated. Is there any tool or procedure to >>> safely check or rebuild the mds data? ...if this behaviour could be caused >>> by some hidden issue with the data itself. >>> >>> Tnx, >>> --------------------------- >>> Olli Rajala - Lead TD >>> Anima Vitae Ltd. >>> www.anima.fi >>> --------------------------- >>> >>> >>> On Fri, Nov 11, 2022 at 9:14 AM Venky Shankar <vshankar@xxxxxxxxxx> >>> wrote: >>> >>>> On Fri, Nov 11, 2022 at 3:06 AM Olli Rajala <olli.rajala@xxxxxxxx> >>>> wrote: >>>>> >>>>> Hi Venky, >>>>> >>>>> I have indeed observed the output of the different sections of perf >>>> dump like so: >>>>> watch -n 1 ceph tell mds.`hostname` perf dump objecter >>>>> watch -n 1 ceph tell mds.`hostname` perf dump mds_cache >>>>> ...etc... >>>>> >>>>> ...but without any proper understanding of what is a normal rate for >>>> some number to go up it's really difficult to make anything from that. >>>>> >>>>> btw - is there some convenient way to capture this kind of temporal >>>> output for others to view. Sure, I could just dump once a second to a file >>>> or sequential files but is there some tool or convention that is easy to >>>> look at and analyze? >>>> >>>> Not really - you'd have to do it yourself. >>>> >>>>> >>>>> Tnx, >>>>> --------------------------- >>>>> Olli Rajala - Lead TD >>>>> Anima Vitae Ltd. >>>>> www.anima.fi >>>>> --------------------------- >>>>> >>>>> >>>>> On Thu, Nov 10, 2022 at 8:18 AM Venky Shankar <vshankar@xxxxxxxxxx> >>>> wrote: >>>>>> >>>>>> Hi Olli, >>>>>> >>>>>> On Mon, Oct 17, 2022 at 1:08 PM Olli Rajala <olli.rajala@xxxxxxxx> >>>> wrote: >>>>>>> >>>>>>> Hi Patrick, >>>>>>> >>>>>>> With "objecter_ops" did you mean "ceph tell mds.pve-core-1 ops" >>>> and/or >>>>>>> "ceph tell mds.pve-core-1 objecter_requests"? Both these show very >>>> few >>>>>>> requests/ops - many times just returning empty lists. I'm pretty >>>> sure that >>>>>>> this I/O isn't generated by any clients - I've earlier tried to >>>> isolate >>>>>>> this by shutting down all cephfs clients and this didn't have any >>>>>>> noticeable effect. >>>>>>> >>>>>>> I tried to watch what is going on with that "perf dump" but to be >>>> honest >>>>>>> all I can see is some numbers going up in the different sections :) >>>>>>> ...don't have a clue what to focus on and how to interpret that. >>>>>>> >>>>>>> Here's a perf dump if you or anyone could make something out of >>>> that: >>>>>>> https://gist.github.com/olliRJL/43c10173aafd82be22c080a9cd28e673 >>>>>> >>>>>> You'd need to capture this over a period of time to see what ops might >>>>>> be going through and what the mds is doing. >>>>>> >>>>>>> >>>>>>> Tnx! >>>>>>> o. >>>>>>> >>>>>>> --------------------------- >>>>>>> Olli Rajala - Lead TD >>>>>>> Anima Vitae Ltd. >>>>>>> www.anima.fi >>>>>>> --------------------------- >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 14, 2022 at 8:32 PM Patrick Donnelly < >>>> pdonnell@xxxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Olli, >>>>>>>> >>>>>>>> On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala <olli.rajala@xxxxxxxx> >>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm seeing constant 25-50MB/s writes to the metadata pool even >>>> when all >>>>>>>>> clients and the cluster is idling and in clean state. This >>>> surely can't >>>>>>>> be >>>>>>>>> normal? >>>>>>>>> >>>>>>>>> There's no apparent issues with the performance of the cluster >>>> but this >>>>>>>>> write rate seems excessive and I don't know where to look for >>>> the >>>>>>>> culprit. >>>>>>>>> >>>>>>>>> The setup is Ceph 16.2.9 running in hyperconverged 3 node core >>>> cluster >>>>>>>> and >>>>>>>>> 6 hdd osd nodes. >>>>>>>>> >>>>>>>>> Here's typical status when pretty much all clients are idling. >>>> Most of >>>>>>>> that >>>>>>>>> write bandwidth and maybe fifth of the write iops is hitting the >>>>>>>>> metadata pool. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>> --------------------------------------------------------------------------------------------------- >>>>>>>>> root@pve-core-1:~# ceph -s >>>>>>>>> cluster: >>>>>>>>> id: 2088b4b1-8de1-44d4-956e-aa3d3afff77f >>>>>>>>> health: HEALTH_OK >>>>>>>>> >>>>>>>>> services: >>>>>>>>> mon: 3 daemons, quorum pve-core-1,pve-core-2,pve-core-3 >>>> (age 2w) >>>>>>>>> mgr: pve-core-1(active, since 4w), standbys: pve-core-2, >>>> pve-core-3 >>>>>>>>> mds: 1/1 daemons up, 2 standby >>>>>>>>> osd: 48 osds: 48 up (since 5h), 48 in (since 4M) >>>>>>>>> >>>>>>>>> data: >>>>>>>>> volumes: 1/1 healthy >>>>>>>>> pools: 10 pools, 625 pgs >>>>>>>>> objects: 70.06M objects, 46 TiB >>>>>>>>> usage: 95 TiB used, 182 TiB / 278 TiB avail >>>>>>>>> pgs: 625 active+clean >>>>>>>>> >>>>>>>>> io: >>>>>>>>> client: 45 KiB/s rd, 38 MiB/s wr, 6 op/s rd, 287 op/s wr >>>>>>>>> >>>>>>>> >>>> --------------------------------------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> Here's some daemonperf dump: >>>>>>>>> >>>>>>>>> >>>>>>>> >>>> --------------------------------------------------------------------------------------------------- >>>>>>>>> root@pve-core-1:~# ceph daemonperf mds.`hostname -s` >>>>>>>>> >>>>>>>> >>>> ----------------------------------------mds----------------------------------------- >>>>>>>>> --mds_cache--- ------mds_log------ -mds_mem- >>>> -------mds_server------- >>>>>>>> mds_ >>>>>>>>> -----objecter------ purg >>>>>>>>> req rlat fwd inos caps exi imi hifc crev cgra ctru cfsa >>>> cfa hcc >>>>>>>> hccd >>>>>>>>> hccr prcr|stry recy recd|subm evts segs repl|ino dn |hcr >>>> hcs hsr cre >>>>>>>>> cat |sess|actv rd wr rdwr|purg| >>>>>>>>> 40 0 0 767k 78k 0 0 0 1 6 1 0 >>>> 0 5 5 >>>>>>>>> 3 7 |1.1k 0 0 | 17 3.7k 134 0 |767k 767k| 40 5 >>>> 0 0 >>>>>>>>> 0 |110 | 4 2 21 0 | 2 >>>>>>>>> 57 2 0 767k 78k 0 0 0 3 16 3 0 >>>> 0 11 11 >>>>>>>>> 0 17 |1.1k 0 0 | 45 3.7k 137 0 |767k 767k| 57 8 >>>> 0 0 >>>>>>>>> 0 |110 | 0 2 28 0 | 4 >>>>>>>>> 57 4 0 767k 78k 0 0 0 4 34 4 0 >>>> 0 34 33 >>>>>>>>> 2 26 |1.0k 0 0 |134 3.9k 139 0 |767k 767k| 57 13 >>>> 0 0 >>>>>>>>> 0 |110 | 0 2 112 0 | 19 >>>>>>>>> 67 3 0 767k 78k 0 0 0 6 32 6 0 >>>> 0 22 22 >>>>>>>>> 0 32 |1.1k 0 0 | 78 3.9k 141 0 |767k 768k| 67 4 >>>> 0 0 >>>>>>>>> 0 |110 | 0 2 56 0 | 2 >>>>>>>>> >>>>>>>> >>>> --------------------------------------------------------------------------------------------------- >>>>>>>>> Any ideas where to look at? >>>>>>>> >>>>>>>> Check the perf dump output of the mds: >>>>>>>> >>>>>>>> ceph tell mds.<fs_name>:0 perf dump >>>>>>>> >>>>>>>> over a period of time to identify what's going on. You can also >>>> look >>>>>>>> at the objecter_ops (another tell command) for the MDS. >>>>>>>> >>>>>>>> -- >>>>>>>> Patrick Donnelly, Ph.D. >>>>>>>> He / Him / His >>>>>>>> Principal Software Engineer >>>>>>>> Red Hat, Inc. >>>>>>>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Cheers, >>>>>> Venky >>>>>> >>>> >>>> >>>> -- >>>> Cheers, >>>> Venky >>>> >>>> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx