On Tue, 2010-05-25 at 09:45 -0400, William A. (Andy) Adamson wrote: > 2010/5/7 Lukas Hejtmanek <xhejtman@xxxxxxxxxxx>: > > Hi, > > > > I encountered the following problem. We use short expiration time for > > kerberos contexts created by rpc.gssd (some patches were included in mainline > > nfs-utils). In particular, we use 120secs expiration time. > > > > Now, I run application that eats 80% of available RAM. Then I run 10 parallel > > dd processes that write data into NFS4 volume with sec=krb5. > > > > As soon as the kerberos context expires (i.e., up to 120 secs), the whole > > system gets stuck in do_page_fault and succesive functions. It is because > > there is no free memory in kernel, all free memory is used as cache for NFS4 > > (due to dd traffic), kernel ask NFS to write back its pages but NFS cannot do > > anything as it is missing valid context. NFS contacts rpc.gssd to provide > > a renewed context, the rpc.gssd does not provide the context as it needs some memory > > to scan /tmp for a ticket. I.e., it deadlocks. > > > > Longer context expiration time is no real solution as it only makes the > > deadlock less often. > > > > Any ideas what can be done here? > > Not get into the problem in the first place: this means > > 1) determine a 'lead time' where the NFS client declares a context > expired even though it really as 'lead time' until it actually > expires. > > 2) flush all writes on any contex that will expire within the lead > time which needs to be long enough for flushes to take place. That too is only a partial solution. The GSS context can expire early due to totally unforeseeable circumstances such as a server reboot, for instance. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html