FileStore performance: coalescing operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

during the performance weely meeting, I had mentioned
my experiences concerning the transaction structure
for write requests at the level of the FileStore.
Such a transaction not only contains the OP_WRITE
operation to the object in the file system, but also
a series of OP_OMAP_SETKEYS and OP_SETATTR operations.

Find attached a README and source code patch, which
describe a prototype for coalescing the OP_OMAP_SETKEYS
operations and the performance impact f this change.

Regards

Andreas Bluemle

-- 
Andreas Bluemle                     mailto:Andreas.Bluemle@xxxxxxxxxxx
ITXperts GmbH                       http://www.itxperts.de
Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm
diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc
index f6c3bb8..29382b2 100644
--- a/src/os/FileStore.cc
+++ b/src/os/FileStore.cc
@@ -2260,10 +2260,24 @@ int FileStore::_check_replay_guard(int fd, const SequencerPosition& spos)
   }
 }
 
+void FileStore::_coalesce(map<string, bufferlist> &target, map<string, bufferlist> &source)
+{
+  for (map<string, bufferlist>::iterator p = source.begin();
+       p != source.end();
+       p++) {
+    target[p->first] = p->second;
+  }
+  return;
+}
+
 unsigned FileStore::_do_transaction(
   Transaction& t, uint64_t op_seq, int trans_num,
   ThreadPool::TPHandle *handle)
 {
+  map<string, bufferlist> collected_aset;
+  coll_t collected_cid;
+  ghobject_t collected_oid;
+
   dout(10) << "_do_transaction on " << &t << dendl;
 
 #ifdef WITH_LTTNG
@@ -2282,6 +2296,22 @@ unsigned FileStore::_do_transaction(
 
     _inject_failure();
 
+    if (op->op == Transaction::OP_OMAP_SETKEYS) {
+	collected_cid = i.get_cid(op->cid);
+	collected_oid = i.get_oid(op->oid);
+	map<string, bufferlist> aset;
+	i.decode_attrset(aset);
+	_coalesce(collected_aset, aset);
+	continue;
+    } else {
+	if (collected_aset.empty() == false) {
+	  tracepoint(objectstore, omap_setkeys_enter, osr_name);
+	  r = _omap_setkeys(collected_cid, collected_oid, collected_aset, spos);
+	  tracepoint(objectstore, omap_setkeys_exit, r);
+	  collected_aset.clear();
+	}
+    }
+
     switch (op->op) {
     case Transaction::OP_NOP:
       break;
diff --git a/src/os/FileStore.h b/src/os/FileStore.h
index af1fb8d..a039731 100644
--- a/src/os/FileStore.h
+++ b/src/os/FileStore.h
@@ -449,6 +449,8 @@ public:
 
   int statfs(struct statfs *buf);
 
+  void _coalesce( map<string, bufferlist> &target, map<string, bufferlist> &source);
+
   int _do_transactions(
     list<Transaction*> &tls, uint64_t op_seq,
     ThreadPool::TPHandle *handle);
Coalescing OMAP_SETKEYS operations in a write transaction
---------------------------------------------------------
Description
-----------

At the level of FileStore, every write request is embedded in a transaction
which consists of
  6 key-value pair settings in 3 OMAP_SETKEYS operations
  the actual OP_WRITE
  2 settings in the extended file system attributes.

The modification of the FileStore::_do_transaction() coalesces the
6 key-value pairs into a single operation, with the side effect of
reducing the number of key-value pairs to 5: one key appears twice
and only the last values is going to be set.

Performance improvement
-----------------------

Cluster with 3 storage nodes, 4 osd (SAS disk, SSD journal) per node,
separate client node with rbd using the kernel clients,
test load generated by fio, randon write, 4K block size, iodepth 16.

client improvement: approx. 5 % (12890 iops vs. 13369 iops)
storage node improvement: reduction in CPU consuptiom of ceph-osd daemon
by 10%; see follwoing table (derived from /proc/<pid>/schedstat:


ceph-osd process and             CPU usage         | CPU usage
thread classes                   v0.91 unmodified  | v0.91 with coalescing
---------------------------------------------------+----------------------
total cpu usage:                 43.17 CPU-seconds | 39.33 CPU-seconds
                                                   |
ThreadPool::WorkThread::entry(): 15.56   36.04%    | 12.45   31.66%
ShardedThreadPool::workers:       8.07   18.70%    |  7.94   20.18%
Pipe::Reader::                    5.81   13.45%    |  5.92   15.04%
Pipe::Writer::entry():            4.59   10.63%    |  4.73   12.02%
FileJournal::Writer::             2.41    5.57%    |  2.45    6.22%
Finisher::finisher_thread:        2.86    6.63%    |  1.03    2.61%
                                                   |
WBThrottle::entry:                n/a     n/a      |  0.81   2.06%

Interesting: with coalescing active, the WBthrottle shows up in CPU usage.
In the default case, this was almost invisible.


Source/Patch
------------
https://www.github.com/andreas-bluemle/ceph
   commit f33c48358f762cbeb5d30724efacf78ff5438e9e

patches:
   relative to pull request at https://www.github.com/andreas-bluemle/ceph
     ceph-andreas-bluemle.file-store-omap_setkeys-colaescing.patch

   relative to ceph master at at https://www.github.com
     (commit a7a70cabe25fdfe3322c784f6797231d14e112c2)
     ceph-master.file-store-omap_setkeys-colaescing.patch


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux