On Wed, 19 Jun 2013, John Nielsen wrote: > Thanks for the update (and, as always, the great work)! > > Could you also indicate for each change which development and stable release(s) they (will) appear in? Sure! > On Jun 19, 2013, at 2:04 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > > The OSDs used to only heartbeat on teh cluster interface. Since they > > actually talk to clients on the public interface this didn't detect > > certain assymetrical network failures; this was fixed in the last release. > > > > We also fixed a long-standing isue where the failure of the backend > > network would cause an OSD to be marked down, but it would mark itself > > back up by talking to the mon on the public interface. It now verifies > > that the backend network is working (by pinging random other osds) before > > marking itself back up to avoid the problem. This will land in v0.65 (out next week). > > The last change was a more robust writeback model. We have a general > > problem where we are journaling writes and then applying them to the file > > system, but the two devices/targets may go at different speeds. We don't > > want to journal to get too far ahead of the fs or else request latency > > become erratic/bursty and the eventual fs commit can take a very long > > time. Sam put togther a simple model of how much work the fs has pending > > for the next commit (based on dirty bytes, dirty files, dirty inodes) and > > bases throttling decisions on that instead of the very rough limits that > > used to be in place. This resolves many 'slow request' warnings and even > > OSD failures (due to op_tp timeouts from very deep queues) on certain > > torturous workload (ffsb). Also v0.65 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html