On Thu, Dec 8, 2011 at 5:55 AM, Amon Ott <a.ott@xxxxxxxxxxxx> wrote: > Hi folks, > > if file access through Ceph kernel client cannot continue, e.g. because there > is no mds available, it hangs forever. > > I would prefer if after a timeout the application would get an error code, > e.g. the -ESTALE that NFS and Gluster return if something goes wrong. This > would allow for the application to handle the error instead of blocking > forever without a chance to recover. This is interesting to me — Ceph works very hard to provide POSIX semantics and so philosophically the introduction of ESTALE returns is not a natural thing for us. That doesn't necessarily make it the wrong choice, but since Ceph's systems are designed to be self-repairing the expectation is that any outage is a temporary situation that will resolve itself pretty quickly. And unlike NFS, which often returns ESTALE when other file accesses might succeed, if Ceph fails on an MDS request that's pretty much the ballgame. So returning ESTALE seems like it's a cop-out, losing data and behaving unexpectedly without actually doing anything to resolve the issues or giving other data a chance to get saved — ie, it's not something we want to do automatically. I believe we already honor interrupts so that you can do things like Ctrl-C an application waiting for IO and cancel the operations. Can you describe why this behavior interests you (and manual interruption is insufficient)? I discussed with a few people the possibility of making it an off-by-default mount option (though I'm unclear on the technical difficulty involved; I'm not big on our kernel stuff); presumably that would be enough for your purposes? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html