Check the thread titled " Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool" from a few days ago. On Fri, Feb 28, 2020 at 9:03 AM Seth Galitzer <sgsax@xxxxxxx> wrote: > > I do not have a large ceph cluster, only 4 nodes plus a mon/mgr with 48 > OSDs. I have one data pool and one metadata pool with a total of about > 140TB of usable storage. I have maybe 30 or so clients. The rest of my > systems connect via a host that is a ceph client and then reshares > through samba and nfs-ganesha. I'm not using rgw anywhere. I'm running > the latest stable release of nautilus (14.2.7) and have had it in > production since August 2019. All ceph nodes and the smb/nfs host are > running centos7 with latest patches. Other clients are a mix of debian > and ubuntu. > > For the last several weeks, I have been getting the warning "Large omap > object found" off and on. I've been resolving it by gradually increasing > the value of osd_deep_scrub_large_omap_object_key_threshold and then > running a deep scrub on the affected pg. I have now increased this > threshold to 1000000 and am wondering if I should keep doing this or if > there is another problem that needs to be addressed. > > The affected pg has been different most times, but they are all on the > same osd and with the same mds object. Here's an excerpt from my current > set of logs to show what I'm seeing: > > # zgrep -i "large omap object found" /var/log/ceph/ceph.log* > /var/log/ceph/ceph.log:2020-02-27 06:02:01.761641 osd.40 (osd.40) 1578 : > cluster [WRN] Large omap object found. Object: > 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count: > 1048576 Size (bytes): 46403355 > /var/log/ceph/ceph.log:2020-02-27 16:18:00.328869 osd.40 (osd.40) 1585 : > cluster [WRN] Large omap object found. Object: > 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count: > 1048559 Size (bytes): 46407183 > /var/log/ceph/ceph.log-20200227.gz:2020-02-26 19:56:24.972431 osd.40 > (osd.40) 1450 : cluster [WRN] Large omap object found. Object: > 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count: > 939236 Size (bytes): 40179994 > /var/log/ceph/ceph.log-20200227.gz:2020-02-26 21:14:16.497161 osd.40 > (osd.40) 1460 : cluster [WRN] Large omap object found. Object: > 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count: > 939232 Size (bytes): 40179796 > /var/log/ceph/ceph.log-20200227.gz:2020-02-26 21:15:06.399267 osd.40 > (osd.40) 1464 : cluster [WRN] Large omap object found. Object: > 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count: > 939231 Size (bytes): 40179756 > > Unfortunately, older logs have already been rotated out, but if memory > serves correctly, they had similar messages. As you can see, the key > count continues to increase. Last week, I bumped the threshold to 750000 > to clear the warning. Before that, I had bumped to 500000. It looks to > me like something isn't getting cleaned up like it's supposed to. I > haven't been using ceph long enough to figure out what that might be. > > Do I continue to bump the key threshold and not worry about the > warnings, or is there something going on that needs to be corrected? At > what point is the threshold too high? If the problem is due to a > specific client not closing files, is it possible to identify that > client and attempt to reset it? > > Any advice is welcome. I'm happy to provide additional data if needed. > > Thanks. > Seth > > -- > Seth Galitzer > Systems Coordinator > Computer Science Department > Kansas State University > http://www.cs.ksu.edu/~sgsax > sgsax@xxxxxxx > 785-532-7790 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Brad _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx