On Fri, Sep 20, 2013 at 5:34 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Fri, 20 Sep 2013, Andreas Joachim Peters wrote: >> Hi, >> >> >> we made some benchmarks about object read/write latencies on the CERN ceph installation. >> >> The cluster has 44 nodes and ~1k disks, all on 10GE and the pool configuration has 3 copies. >> Client & Server is 0.67. >> >> The latencies we observe (using tiny objects ... 5 bytes) on the idle pool: >> >> write full object(sync) ~65-80ms >> append to object ~60-75ms >> set xattr object ~65-80ms >> lock object ~65-80ms >> stat object ~1ms > > How are the individual OSDs configured? Are they purely HDD's with a > journal partition, or is there an SSD journal? If it's pure HDD, this > will be a signnificant source of latency. No SSD journal, same HDD for journal and partition. It still seems large to have >45ms for 1 copy though, no? > > That said, Dieter just recently pointed out to me that he's observing > significant time in a request turnaround (single request, 1 request in > flight) that is not be related to the storage backend. I've been > traveling this week and haven't had time to look into it yet. Tracking > this down should just be a matter of turning up the logs and looking > carefully at the timestamps to see where things are being delayed. Which logs.. debug_ms or debug_osd or both or something else? > >> We seem to saturate the pools writing ~ 20k objects/s (= internally 60k/s). >> >> Is there an easy explanation for 80 ms (quasi without payload) and a possible tuning to reduce that? >> I measured (append few bytes +fsync) on such a disk around 33ms which explains probably part of the latency. >> >> Then I tried with the async API to see if there is a difference in the >> measurement between wait_for_complete or wait_for_safe ... shouldn't >> wait_for_complete be much shorter, but I get always comparable results >> ... > > If you are using XFS or ext4 on teh backend, the OSD is doing write-ahead > journaling, which means that in reality the commit happens before the op > is applied to the fs and is readable. (The 'commit/ondisk' reply implies > an ack so the OSD has some internal locking to maintain this illusion from > the client's perspective.) It's XFS. Cheers, Dan > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html