continued warnings: Large omap object found

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I do not have a large ceph cluster, only 4 nodes plus a mon/mgr with 48 OSDs. I have one data pool and one metadata pool with a total of about 140TB of usable storage. I have maybe 30 or so clients. The rest of my systems connect via a host that is a ceph client and then reshares through samba and nfs-ganesha. I'm not using rgw anywhere. I'm running the latest stable release of nautilus (14.2.7) and have had it in production since August 2019. All ceph nodes and the smb/nfs host are running centos7 with latest patches. Other clients are a mix of debian and ubuntu.

For the last several weeks, I have been getting the warning "Large omap object found" off and on. I've been resolving it by gradually increasing the value of osd_deep_scrub_large_omap_object_key_threshold and then running a deep scrub on the affected pg. I have now increased this threshold to 1000000 and am wondering if I should keep doing this or if there is another problem that needs to be addressed.

The affected pg has been different most times, but they are all on the same osd and with the same mds object. Here's an excerpt from my current set of logs to show what I'm seeing:

# zgrep -i "large omap object found" /var/log/ceph/ceph.log*
/var/log/ceph/ceph.log:2020-02-27 06:02:01.761641 osd.40 (osd.40) 1578 : cluster [WRN] Large omap object found. Object: 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count: 1048576 Size (bytes): 46403355 /var/log/ceph/ceph.log:2020-02-27 16:18:00.328869 osd.40 (osd.40) 1585 : cluster [WRN] Large omap object found. Object: 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count: 1048559 Size (bytes): 46407183 /var/log/ceph/ceph.log-20200227.gz:2020-02-26 19:56:24.972431 osd.40 (osd.40) 1450 : cluster [WRN] Large omap object found. Object: 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count: 939236 Size (bytes): 40179994 /var/log/ceph/ceph.log-20200227.gz:2020-02-26 21:14:16.497161 osd.40 (osd.40) 1460 : cluster [WRN] Large omap object found. Object: 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count: 939232 Size (bytes): 40179796 /var/log/ceph/ceph.log-20200227.gz:2020-02-26 21:15:06.399267 osd.40 (osd.40) 1464 : cluster [WRN] Large omap object found. Object: 2:c9647462:::mds0_openfiles.1:head PG: 2.462e2693 (2.13) Key count: 939231 Size (bytes): 40179756

Unfortunately, older logs have already been rotated out, but if memory serves correctly, they had similar messages. As you can see, the key count continues to increase. Last week, I bumped the threshold to 750000 to clear the warning. Before that, I had bumped to 500000. It looks to me like something isn't getting cleaned up like it's supposed to. I haven't been using ceph long enough to figure out what that might be.

Do I continue to bump the key threshold and not worry about the warnings, or is there something going on that needs to be corrected? At what point is the threshold too high? If the problem is due to a specific client not closing files, is it possible to identify that client and attempt to reset it?

Any advice is welcome. I'm happy to provide additional data if needed.

Thanks.
Seth

--
Seth Galitzer
Systems Coordinator
Computer Science Department
Kansas State University
http://www.cs.ksu.edu/~sgsax
sgsax@xxxxxxx
785-532-7790
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux