On Thu, Jul 23, 2009 at 09:44:57PM -0700, Sage Weil wrote: > On Thu, 23 Jul 2009, Trond Myklebust wrote: > > On Thu, 2009-07-23 at 11:26 -0700, Sage Weil wrote: > > > A related question I had on writepages failures: what is the 'right' thing > > > to do if we get a server error on writeback? If we believe it may be > > > transient (say, ENOSPC), should we redirty pages and hope for better luck > > > next time? > > > > How would ENOSPC be transient? On most systems, ENOSPC requires some > > kind of user action in order to allow recovery, so will they pass the > > error back to the application. > > In a distributed environment, other users may be deleting data, or the > cluster might be expanding/rebalancing as new storage is added to the > system. The client doesn't have much ability to distinguish between these cases, so if you wanted to handle them I'd think the way to do it would be by adding errors in the protocol. (E.g. your MDS could use something like "EJUKEBOX" to mean "I'm bringing new storage online" or "a user just asked me to truncate a 5TB file", and reserve "ENOSPC" for the case where the next call isn't going to succeed without somebody's help.) > Of course, any retry after ENOSPC should be limited to a small > number of additional attempts. There may be cases when the delay returning ENOSPC becomes annoying. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html