Quoting Stefan Kooman (stefan@xxxxxx): > .... and it crashed again (and again) ... until we stopped the mds and > deleted the mds0_openfiles.0 from the metadata pool. > > Here is the (debug) output: > > A specific workload that *might* have triggered this: recursively deleting a long > list of files and directories (~ 7 milion in total) with 5 "rm" processes > in parallel ... It crashed two times ... this is the other info of the crash: -10001> 2019-12-04 20:28:34.929 7fd43ce9b700 5 -- [2001:7b8:80:3:0:2c:3:2]:6800/3833566625 >> [2001:7b8:80:1:0:1:2:10]:6803/727090 conn(0x55e93ca96300 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=141866652 cs=1 l=1). rx osd.90 seq 32255 0x55e9416cb0c0 osd_op_reply(4104640 10001afe266.00000000 [stat,omap-set-header,omap-set-vals] v63840'10049940 uv10049940 ondisk = 0) v8 -10001> 2019-12-04 20:28:34.929 7fd43ce9b700 1 -- [2001:7b8:80:3:0:2c:3:2]:6800/3833566625 <== osd.90 [2001:7b8:80:1:0:1:2:10]:6803/727090 32255 ==== osd_op_reply(4104640 10001afe266.00000000 [stat,omap-set-header,omap-set-vals] v63840'10049940 uv10049940 ondisk = 0) v8 ==== 248+0+0 (969216453 0 0) 0x55e9416cb0c0 con 0x55e93ca96300 -10001> 2019-12-04 20:28:34.937 7fd436ca7700 0 mds.0.openfiles omap_num_objs 1025 -10001> 2019-12-04 20:28:34.937 7fd436ca7700 0 mds.0.openfiles anchor_map size 19678 -10001> 2019-12-04 20:28:34.937 7fd436ca7700 -1 /build/ceph-13.2.6/src/mds/OpenFileTable.cc: In function 'void OpenFileTable::commit(MDSInternalContextBase*, uint64_t, int)' thread 7fd436ca7700 time 2019-12-04 20:28:34.939048 /build/ceph-13.2.6/src/mds/OpenFileTable.cc: 476: FAILED assert(omap_num_objs <= MAX_OBJECTS) mds.0.openfiles omap_num_objs 1025 <- ... just 1 higher than 1024? Coincidence? Gr. Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com