Hi, Recently about ~2 weeks ago something strange started happening with one of the ceph cluster I'm managing. It's running ceph jewel 10.2.10 with cache layer. Some OSD's started crashing with "too many open files error". From looking at the issue I have found that it keeps a lot of links in /proc/self/fd and once 1 mil limit is reached it crashes. I have tried increasing the limit to 2 mil, but same thing happened. The problem with this is that it's not clearing /proc/self/fd as there is about 900k inodes used inside the OSD drive. Once the OSD is restarted and scrub starts I'm getting missing shard errors: 2018-07-15 18:32:26.554348 7f604ebd1700 -1 log_channel(cluster) log [ERR] : 6.58 shard 51 missing 6:1a3a2565:::rbd_data.314da9e52da0f2.000000000000d570:head OSD crash log: -4> 2018-07-15 17:40:25.566804 7f97143fe700 0 filestore(/var/lib/ceph/osd/ceph-44) error (24) Too many open files not handled on operation 0x7f970e0274c0 (5142329351.0.0, or op 0, counting from 0) -3> 2018-07-15 17:40:25.566825 7f97143fe700 0 filestore(/var/lib/ceph/osd/ceph-44) unexpected error code -2> 2018-07-15 17:40:25.566829 7f97143fe700 0 filestore(/var/lib/ceph/osd/ceph-44) transaction dump: { "ops": [ { "op_num": 0, "op_name": "touch", "collection": "6.f0_head", "oid": "#-8:0f000000:::temp_6.f0_0_55255967_2688:head#" }, { "op_num": 1, "op_name": "write", "collection": "6.f0_head", "oid": "#-8:0f000000:::temp_6.f0_0_55255967_2688:head#", "length": 65536, "offset": 0, "bufferlist length": 65536 }, { "op_num": 2, "op_name": "omap_setkeys", "collection": "6.f0_head", "oid": "#6:0f000000::::head#", "attr_lens": { "_info": 925 } } ] } -1> 2018-07-15 17:40:25.566886 7f97143fe700 -1 dump_open_fds unable to open /proc/self/fd 0> 2018-07-15 17:40:25.569564 7f97143fe700 -1 os/filestore/FileStore.cc: In function 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f97143fe700 time 2018-07-15 17:40:25.566888 os/filestore/FileStore.cc: 2930: FAILED assert(0 == "unexpected error") Any insight on how to fix this issue is appreciated. Regards, Darius _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com