recent osd changes

Sage Weil <sage@xxxxxxxxxxx> · Wed, 19 Jun 2013 13:04:21 -0700 (PDT)

The OSDs used to only heartbeat on teh cluster interface.  Since they 
actually talk to clients on the public interface this didn't detect 
certain assymetrical network failures; this was fixed in the last release.

We also fixed a long-standing isue where the failure of the backend 
network would cause an OSD to be marked down, but it would mark itself 
back up by talking to the mon on the public interface.  It now verifies 
that the backend network is working (by pinging random other osds) before 
marking itself back up to avoid the problem.

The last change was a more robust writeback model.  We have a general 
problem where we are journaling writes and then applying them to the file 
system, but the two devices/targets may go at different speeds.  We don't 
want to journal to get too far ahead of the fs or else request latency 
become erratic/bursty and the eventual fs commit can take a very long 
time.  Sam put togther a simple model of how much work the fs has pending 
for the next commit (based on dirty bytes, dirty files, dirty inodes) and 
bases throttling decisions on that instead of the very rough limits that 
used to be in place.  This resolves many 'slow request' warnings and even 
OSD failures (due to op_tp timeouts from very deep queues) on certain 
torturous workload (ffsb).

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html