On Tue, 2022-08-16 at 13:21 +1000, distroguy@xxxxxxxxx wrote: > > I'm not quite sure of the relationship of operations between MDS and > OSD data. The MDS gets written to nvme pool and clients access data > directly on OSD nodes, but do MDS operations also need to wait for > OSDs > to perform operations? I think it makes sense that they do (for > example, to unlink a file MDS needs to check if there are any other > hardlinks to it, and if not, then the data can be deleted from OSDs > and > the metadata updated to remove the file)? > > So to that end, would slow performing OSDs also impact MDS > performance? > Maybe it's stuck waiting for the OSDs to do their thing, and they > aren't fast enough... but then wouldn't I see much more %wa? > Related datapoints I forgot to mention: We get lots of "MDS health slow requests are blocked" error messages every couple of minutes. Looking at August 13th logs, we had 911 log lines about the clearing of these slow requests. The message with the highest number was 11,193 slow requests cleared, the average is 472. I know we also have some OSD disks in the cluster with SMART errors, which I'm looking to replace. However, we do not see the same number of slow OSD requests - "only" 13 lines about blocked requests due to OSD messages. I do plan to chase those down though and see if I can work out if it's unhealthy disk, or intermittent network/host issues. However, my point is that if MDS was bottlenecked due to slow OSDs, I feel like I should see more corresponding blocked request OSD messages?... Cheers, -c _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx