On Sun, 3 Apr 2016, huang jun wrote: > Hi, all > Recently, we test deletion performance of erasure pool, and we found > the delete op > is pretty slow. > We trace the code and found Delete op was converted to Rename in Filestore, > It was implemented in function FileStore::_collection_move_rename(), > it calls FileStore::_set_replay_guard function and > FileStore::_close_replay_guard function, > in these 2 function, do 3 fsyncs and 1 objectmap sync, which we think > spend the most time. > =================================== > Envirnoment: > ceph version: 0.94.5 > linux kernel: 3.18 > Test cluster: 1MON, 1MDS, 4OSD > =================================== > We do some comparison tests: > 1. sync omap + fd(default) > avg rename op used 0.883818s > 2. only sync omap: 0.428431s > 3. only sync fd: 0.400266s > 4. dont sync: 0.00319648 > 5. do posix_fadvise(FADVISE_DONTNEED) after write, and sync omap + fd: 0.855178s > 6. use fdatasync to replace fsync, and sync omap + fd : 0.432659s > > As we can see, sync fd and sync objectmap use 50% of the total time each. > Compare 1 with 6, fdatasync uses 50% less time compared to fdatasync, > which means sync metadata spent more time than sync data. > In FileStore::_set_replay_guard function and > FileStore::_close_replay_guard function, > it only set object's user.cephos.seq xattr, then do sync to let it durable. > > I have some questions: > 1. Do we record the xattr(user.cephos.seq) to avoid replaying an older > transaction? Yes, exactly. > 2. If dont do sync, we will get the best performance, is there any > side effects? like: get data corrupted. Yes--we may replay incorrectly after a failure. Unfortunately, not an option. This problem goes away with BlueStore, so we only have to live with it for a bit longer. I don't think it's worth investing any effort into addressing this with FileStore--we'll be unlikely to want to merge any non-trivial change anyway. sage