Hi, all Recently, we test deletion performance of erasure pool, and we found the delete op is pretty slow. We trace the code and found Delete op was converted to Rename in Filestore, It was implemented in function FileStore::_collection_move_rename(), it calls FileStore::_set_replay_guard function and FileStore::_close_replay_guard function, in these 2 function, do 3 fsyncs and 1 objectmap sync, which we think spend the most time. =================================== Envirnoment: ceph version: 0.94.5 linux kernel: 3.18 Test cluster: 1MON, 1MDS, 4OSD =================================== We do some comparison tests: 1. sync omap + fd(default) avg rename op used 0.883818s 2. only sync omap: 0.428431s 3. only sync fd: 0.400266s 4. dont sync: 0.00319648 5. do posix_fadvise(FADVISE_DONTNEED) after write, and sync omap + fd: 0.855178s 6. use fdatasync to replace fsync, and sync omap + fd : 0.432659s As we can see, sync fd and sync objectmap use 50% of the total time each. Compare 1 with 6, fdatasync uses 50% less time compared to fdatasync, which means sync metadata spent more time than sync data. In FileStore::_set_replay_guard function and FileStore::_close_replay_guard function, it only set object's user.cephos.seq xattr, then do sync to let it durable. I have some questions: 1. Do we record the xattr(user.cephos.seq) to avoid replaying an older transaction? 2. If dont do sync, we will get the best performance, is there any side effects? like: get data corrupted. Any comments are welcome. -- thanks huangjun -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html