Re: continued warnings: Large omap object found

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 28 Feb 2020 15:37:09 +1000

Check the thread titled " Frequest LARGE_OMAP_OBJECTS in
cephfs metadata pool" from a few days ago.

On Fri, Feb 28, 2020 at 9:03 AM Seth Galitzer <sgsax@xxxxxxx> wrote:
>
> I do not have a large ceph cluster, only 4 nodes plus a mon/mgr with 48
> OSDs. I have one data pool and one metadata pool with a total of about
> 140TB of usable storage. I have maybe 30 or so clients. The rest of my
> systems connect via a host that is a ceph client and then reshares
> through samba and nfs-ganesha. I'm not using rgw anywhere. I'm running
> the latest stable release of nautilus (14.2.7) and have had it in
> production since August 2019. All ceph nodes and the smb/nfs host are
> running centos7 with latest patches. Other clients are a mix of debian
> and ubuntu.
>
> For the last several weeks, I have been getting the warning "Large omap
> object found" off and on. I've been resolving it by gradually increasing
> the value of osd_deep_scrub_large_omap_object_key_threshold and then
> running a deep scrub on the affected pg. I have now increased this
> threshold to 1000000 and am wondering if I should keep doing this or if
> there is another problem that needs to be addressed.
>
> The affected pg has been different most times, but they are all on the
> same osd and with the same mds object. Here's an excerpt from my current
> set of logs to show what I'm seeing:
>
> # zgrep -i "large omap object found" /var/log/ceph/ceph.log*
> /var/log/ceph/ceph.log:2020-02-27 06:02:01.761641 osd.40 (osd.40) 1578 :
> cluster [WRN] Large omap object found. Object:
> 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count:
> 1048576 Size (bytes): 46403355
> /var/log/ceph/ceph.log:2020-02-27 16:18:00.328869 osd.40 (osd.40) 1585 :
> cluster [WRN] Large omap object found. Object:
> 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count:
> 1048559 Size (bytes): 46407183
> /var/log/ceph/ceph.log-20200227.gz:2020-02-26 19:56:24.972431 osd.40
> (osd.40) 1450 : cluster [WRN] Large omap object found. Object:
> 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count:
> 939236 Size (bytes): 40179994
> /var/log/ceph/ceph.log-20200227.gz:2020-02-26 21:14:16.497161 osd.40
> (osd.40) 1460 : cluster [WRN] Large omap object found. Object:
> 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count:
> 939232 Size (bytes): 40179796
> /var/log/ceph/ceph.log-20200227.gz:2020-02-26 21:15:06.399267 osd.40
> (osd.40) 1464 : cluster [WRN] Large omap object found. Object:
> 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count:
> 939231 Size (bytes): 40179756
>
> Unfortunately, older logs have already been rotated out, but if memory
> serves correctly, they had similar messages. As you can see, the key
> count continues to increase. Last week, I bumped the threshold to 750000
> to clear the warning. Before that, I had bumped to 500000. It looks to
> me like something isn't getting cleaned up like it's supposed to. I
> haven't been using ceph long enough to figure out what that might be.
>
> Do I continue to bump the key threshold and not worry about the
> warnings, or is there something going on that needs to be corrected? At
> what point is the threshold too high? If the problem is due to a
> specific client not closing files, is it possible to identify that
> client and attempt to reset it?
>
> Any advice is welcome. I'm happy to provide additional data if needed.
>
> Thanks.
> Seth
>
> --
> Seth Galitzer
> Systems Coordinator
> Computer Science Department
> Kansas State University
> http://www.cs.ksu.edu/~sgsax
> sgsax@xxxxxxx
> 785-532-7790
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx