On Mon, Apr 18, 2011 at 4:02 PM, Gregory Farnum <gfarnum@xxxxxxxxx> wrote: > On Monday, April 18, 2011 at 3:38 PM, Zenon Panoussis wrote: > >> On 04/18/2011 11:21 PM, Gregory Farnum wrote: >> >> > I looked through your logs a bit and noticed that the OSD on node01 is >> > crashing due to high latencies on disk access (I think the defaults for >> > this case are it asserts out if there's no progress after 10 minutes or >> > something). >> >> First of all, thank you for plowing through those huge logs. It's a feat >> all and by itself. >> >> Could you please post an example where you found the OSD crashing, so that >> I and others know what log entries to look for? > > In this case what I was actually searching for was client requests coming in and then getting lost on some wait list somewhere. I didn't see any of those, but I did see an OSD backtrace: > 2011-04-16 20:11:30.304029 7fce00c24700 FileStore: sync_entry timed out after 600 seconds. > ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5) > 2011-04-16 20:11:30.304483 1: (SafeTimer::timer_thread()+0x36b) [0x5f365b] > 2011-04-16 20:11:30.304517 2: (SafeTimerThread::entry()+0xd) [0x5f5dfd] > 2011-04-16 20:11:30.304545 3: /lib64/libpthread.so.0() [0x3ea6e077e1] > 2011-04-16 20:11:30.304573 4: (clone()+0x6d) [0x3ea6ae18ed] > 2011-04-16 20:11:30.304599 os/FileStore.cc: In function 'virtual void SyncEntryTimeout::finish(int)', in thread '0x7fce00c24700' > os/FileStore.cc: 2573: FAILED assert(0) > ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5) > 1: (SyncEntryTimeout::finish(int)+0xf4) [0x5953d4] > 2: (SafeTimer::timer_thread()+0x36b) [0x5f365b] > 3: (SafeTimerThread::entry()+0xd) [0x5f5dfd] > 4: /lib64/libpthread.so.0() [0x3ea6e077e1] > 5: (clone()+0x6d) [0x3ea6ae18ed] > ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5) > 1: (SyncEntryTimeout::finish(int)+0xf4) [0x5953d4] > 2: (SafeTimer::timer_thread()+0x36b) [0x5f365b] > 3: (SafeTimerThread::entry()+0xd) [0x5f5dfd] > 4: /lib64/libpthread.so.0() [0x3ea6e077e1] > 5: (clone()+0x6d) [0x3ea6ae18ed] This particular timeout was added in response to a specific bug where someone did a drive pull, and it was never reported to userspace. Instead, the btrfs filesystem just blocked forever. http://tracker.newdream.net/issues/735 We decided just to add a timeout that would trigger if a commit took too long. This is basically a dirty hack. It would be better just to have the drive pull event properly communicated to userspace. However, we thought that it wouldn't really cause problems in practice since the timeout was so long. There are other timeouts in the code, like the OSD heartbeat timeout. In general, as Greg pointed out, Ceph is supposed to assure data security through redundant copies on multiple OSDs. If we let unresponsive OSDs linger forever, we could get in a situation where the upper layer (the Ceph filesystem) continues to wait forever for an unresponsive OSD, even though there are lots of other OSDs that could easily take up the slack. This is like starving to death at the all-you-can-eat buffet, just because they're out of jell-o. There is some logic in FileStore::queue_transactions that is supposed to throttle the requests we're making to the FileStore. I haven't ever really examined that code. I guess it's possible that it is not doing a very good job of preventing work from piling up faster than we can take care of it. In general, there is a lot of performance tuning and optimization work left to do. We're working towards having better metrics so that it will be easier to measure our progress in this area. Hopefully we can get something like a nightly autotest that will email out performance numbers for the master branch. cheers, Colin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html