Hi, during the performance weely meeting, I had mentioned my experiences concerning the transaction structure for write requests at the level of the FileStore. Such a transaction not only contains the OP_WRITE operation to the object in the file system, but also a series of OP_OMAP_SETKEYS and OP_SETATTR operations. Find attached a README and source code patch, which describe a prototype for coalescing the OP_OMAP_SETKEYS operations and the performance impact f this change. Regards Andreas Bluemle -- Andreas Bluemle mailto:Andreas.Bluemle@xxxxxxxxxxx ITXperts GmbH http://www.itxperts.de Balanstrasse 73, Geb. 08 Phone: (+49) 89 89044917 D-81541 Muenchen (Germany) Fax: (+49) 89 89044910 Company details: http://www.itxperts.de/imprint.htm
diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc index f6c3bb8..29382b2 100644 --- a/src/os/FileStore.cc +++ b/src/os/FileStore.cc @@ -2260,10 +2260,24 @@ int FileStore::_check_replay_guard(int fd, const SequencerPosition& spos) } } +void FileStore::_coalesce(map<string, bufferlist> &target, map<string, bufferlist> &source) +{ + for (map<string, bufferlist>::iterator p = source.begin(); + p != source.end(); + p++) { + target[p->first] = p->second; + } + return; +} + unsigned FileStore::_do_transaction( Transaction& t, uint64_t op_seq, int trans_num, ThreadPool::TPHandle *handle) { + map<string, bufferlist> collected_aset; + coll_t collected_cid; + ghobject_t collected_oid; + dout(10) << "_do_transaction on " << &t << dendl; #ifdef WITH_LTTNG @@ -2282,6 +2296,22 @@ unsigned FileStore::_do_transaction( _inject_failure(); + if (op->op == Transaction::OP_OMAP_SETKEYS) { + collected_cid = i.get_cid(op->cid); + collected_oid = i.get_oid(op->oid); + map<string, bufferlist> aset; + i.decode_attrset(aset); + _coalesce(collected_aset, aset); + continue; + } else { + if (collected_aset.empty() == false) { + tracepoint(objectstore, omap_setkeys_enter, osr_name); + r = _omap_setkeys(collected_cid, collected_oid, collected_aset, spos); + tracepoint(objectstore, omap_setkeys_exit, r); + collected_aset.clear(); + } + } + switch (op->op) { case Transaction::OP_NOP: break; diff --git a/src/os/FileStore.h b/src/os/FileStore.h index af1fb8d..a039731 100644 --- a/src/os/FileStore.h +++ b/src/os/FileStore.h @@ -449,6 +449,8 @@ public: int statfs(struct statfs *buf); + void _coalesce( map<string, bufferlist> &target, map<string, bufferlist> &source); + int _do_transactions( list<Transaction*> &tls, uint64_t op_seq, ThreadPool::TPHandle *handle);
Coalescing OMAP_SETKEYS operations in a write transaction --------------------------------------------------------- Description ----------- At the level of FileStore, every write request is embedded in a transaction which consists of 6 key-value pair settings in 3 OMAP_SETKEYS operations the actual OP_WRITE 2 settings in the extended file system attributes. The modification of the FileStore::_do_transaction() coalesces the 6 key-value pairs into a single operation, with the side effect of reducing the number of key-value pairs to 5: one key appears twice and only the last values is going to be set. Performance improvement ----------------------- Cluster with 3 storage nodes, 4 osd (SAS disk, SSD journal) per node, separate client node with rbd using the kernel clients, test load generated by fio, randon write, 4K block size, iodepth 16. client improvement: approx. 5 % (12890 iops vs. 13369 iops) storage node improvement: reduction in CPU consuptiom of ceph-osd daemon by 10%; see follwoing table (derived from /proc/<pid>/schedstat: ceph-osd process and CPU usage | CPU usage thread classes v0.91 unmodified | v0.91 with coalescing ---------------------------------------------------+---------------------- total cpu usage: 43.17 CPU-seconds | 39.33 CPU-seconds | ThreadPool::WorkThread::entry(): 15.56 36.04% | 12.45 31.66% ShardedThreadPool::workers: 8.07 18.70% | 7.94 20.18% Pipe::Reader:: 5.81 13.45% | 5.92 15.04% Pipe::Writer::entry(): 4.59 10.63% | 4.73 12.02% FileJournal::Writer:: 2.41 5.57% | 2.45 6.22% Finisher::finisher_thread: 2.86 6.63% | 1.03 2.61% | WBThrottle::entry: n/a n/a | 0.81 2.06% Interesting: with coalescing active, the WBthrottle shows up in CPU usage. In the default case, this was almost invisible. Source/Patch ------------ https://www.github.com/andreas-bluemle/ceph commit f33c48358f762cbeb5d30724efacf78ff5438e9e patches: relative to pull request at https://www.github.com/andreas-bluemle/ceph ceph-andreas-bluemle.file-store-omap_setkeys-colaescing.patch relative to ceph master at at https://www.github.com (commit a7a70cabe25fdfe3322c784f6797231d14e112c2) ceph-master.file-store-omap_setkeys-colaescing.patch