How to improve erasure pool's delete performance

huang jun <hjwsm1989@xxxxxxxxx> · Sun, 3 Apr 2016 14:40:09 +0800

Hi, all
Recently, we test deletion performance of erasure pool, and we found
the delete op
is pretty slow.
We trace the code and found Delete op was converted to Rename in Filestore,
It was implemented in function FileStore::_collection_move_rename(),
it calls FileStore::_set_replay_guard function and
FileStore::_close_replay_guard function,
in these 2 function, do 3 fsyncs and 1 objectmap sync, which we think
spend the most time.
===================================
Envirnoment:
ceph version: 0.94.5
linux kernel: 3.18
Test cluster: 1MON, 1MDS, 4OSD
===================================
We do some comparison tests:
1. sync omap + fd（default)
avg rename op used 0.883818s
2. only sync omap: 0.428431s
3. only sync fd: 0.400266s
4. dont sync: 0.00319648
5. do posix_fadvise(FADVISE_DONTNEED) after write, and sync omap + fd: 0.855178s
6. use fdatasync to replace fsync, and sync omap + fd : 0.432659s

As we can see, sync fd and sync objectmap use 50% of the total time each.
Compare 1 with 6, fdatasync uses 50% less time compared to fdatasync,
which means sync metadata spent more time than sync data.
In FileStore::_set_replay_guard function and
FileStore::_close_replay_guard function,
it only set object's user.cephos.seq xattr, then do sync to let it durable.

I have some questions:
1. Do we record the xattr(user.cephos.seq) to avoid replaying an older
transaction?
2. If dont do sync, we will get the best performance, is there any
side effects? like: get data corrupted.

Any comments are welcome.

-- 
thanks
huangjun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html