On 04/18/2011 11:21 PM, Gregory Farnum wrote: > I looked through your logs a bit and noticed that the OSD on node01 is > crashing due to high latencies on disk access (I think the defaults for > this case are it asserts out if there's no progress after 10 minutes or > something). First of all, thank you for plowing through those huge logs. It's a feat all and by itself. Could you please post an example where you found the OSD crashing, so that I and others know what log entries to look for? > Based on that, I pretty much have to guess that there's just too much > stress on your disk and it's going to cause problems. You can try loosening > the various configurable timeouts to let it run longer but it seems like > really you just need beefier disks for the amount of stuff you're doing to > them. My hardware is indeed very primitive, but in order to prevent this from happening I would have to make sure that the disks always have more capacity than the network. In a real-world setup, with gigabit or muti-gigabit networking and multiple applications doing disk I/O simultaneously, this is unfeasible. Also, I suspect that it would go against the hierarchy of O/S subsystem layering. What I mean is this: if an application tries to write data to the file system and fails, the application should either hang or time out and bail out; the file system itself should still not crash. The application is always agnostic about the file system, so therefore the file system should never acknowledge more data than it can promise to actually process. In the case of ceph things get complicated by the fact that ceph appears as a file system to the applications using it, but depends itself on an underlying file system for its disk access. As a result, ceph is responsible for the data it accepts from applications, but has no way to meet this responsibility if the underlying file system lets it down. I don't know how this problem can be truly solved, but some trickery with I/O buffers might go a long way towards mitigating it. Or perhaps some available capacity calls between the monitor and the client. Every other networked file system has a similar problem, so looking at how NFS or samba deal with it could provide ideas or even ready code. > IIRC you're running a monitor and an OSD on the same 2.5" physical disk, > which means they're colliding on stuff like sync() calls. Indeed, I'm runing the entire system on a dirt cheap 2,5" disk. Still, good software on bad hardware should run slow or not at all, but not try to run fast and then crash and corrupt its data. > This general slowness doesn't explain the mds log corruption, although it > might be one of the trigger conditions. I added another assert in the > Journaler code which might have caused the problem (though I don't think > it could have) but don't have any other new ideas. I'll test again as soon as 0.27 is out (BTW, is 0.27 blocked by 0.26.1 or do they run independent of each-other?). Z -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html