Re: nfs client performance while server is down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Any idea how I could do that?

On Mon, Jan 25, 2010 at 10:01 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> On Jan 25, 2010, at 2:38 PM, Whoop Whouzer wrote:
>>
>> Running  "strace nautilus" gives me allot of output. When I run it
>> while the server is down it completes the trace without a hiccup, it
>> returns and than nautilus is launched and hangs.
>> There are differences between the traces (with server up and server
>> down). I can't really see where the problem lies in there.
>
> I would expect that the command-line nautilus forks when it starts up.  If
> it has some option you can specify to prevent that, it might allow a deeper
> look.  You would need to tell strace to look at the children, too.
>
>> On Mon, Jan 25, 2010 at 8:08 PM, Chuck Lever <chuck.lever@xxxxxxxxxx>
>> wrote:
>>>
>>> On Jan 25, 2010, at 2:02 PM, Whoop Whouzer wrote:
>>>>
>>>> Ok, I did that, after shutting down the server and enabling debug
>>>> trace I tried to open the home folder of the current account (totally
>>>> unrelated to the nfsshare), it wouldn't open at all, I got no nautilus
>>>> at all. During the time my cursor was in busy mode I got the following
>>>> messages in kern.log (for ubuntu 10.04 client):
>>>> Jan 25 19:30:13 whoop-desktop kernel: [  160.719262] NFS call  fsstat
>>>> Jan 25 19:30:37 whoop-desktop kernel: [  184.458611] NFS:
>>>> permission(0:16/74386), mask=0x10, res=0
>>>> Jan 25 19:30:37 whoop-desktop kernel: [  184.458647] NFS call  access
>>>> Jan 25 19:30:43 whoop-desktop kernel: [  190.721086] nfs: server
>>>> 192.168.1.130 not responding, timed out
>>>> Jan 25 19:30:43 whoop-desktop kernel: [  190.721113] NFS reply statfs:
>>>> -5
>>>> Jan 25 19:30:43 whoop-desktop kernel: [  190.721116] nfs_statfs:
>>>> statfs error = 5
>>>> These series of traces are repeating over and over again at a set
>>>> interval (there is no flooding of the logs), even if I do nothing.
>>>> It's even worse than I thought because when I tried to shutdown, the
>>>> machine wouldn't shutdown because it claimed
>>>> the "File manager" was still running (although it was not visible on
>>>> screen); so I had to kill that before I could shutdown (properly).
>>>>
>>>> In Fedora 12 I had a similar user experience (nautilus did show up
>>>> without showing any contents and it was hanging). I had enabled
>>>> tracing and it seems to be logged to /var/log/messages. I got this
>>>> output in fedora:
>>>> Jan 25 20:48:38 localhost kernel: NFS reply statfs: -5
>>>> Jan 25 20:48:38 localhost kernel: nfs_statfs: statfs error = 5
>>>> Jan 25 20:48:38 localhost kernel: NFS call  fsstat
>>>> Jan 25 20:49:14 localhost kernel: nfs: server 192.168.1.130 not
>>>> responding, timed out
>>>> Jan 25 20:49:14 localhost kernel: NFS reply getattr: -5
>>>> Jan 25 20:49:14 localhost kernel: nfs_revalidate_inode: (0:14/74386)
>>>> getattr failed, error=-5
>>>> Jan 25 20:49:25 localhost kernel: NFS: revalidating (0:14/74386)
>>>> Jan 25 20:49:25 localhost kernel: NFS call  getattr
>>>> Jan 25 20:50:14 localhost kernel: nfs: server 192.168.1.130 not
>>>> responding, timed out
>>>> Jan 25 20:50:14 localhost kernel: NFS reply access: -5
>>>> Jan 25 20:50:14 localhost kernel: NFS: permission(0:14/74386), mask=0x1,
>>>> res=-5
>>>> Jan 25 20:50:14 localhost kernel: NFS call  access
>>>> Jan 25 20:51:14 localhost kernel: nfs: server 192.168.1.130 not
>>>> responding, timed out
>>>> Jan 25 20:51:14 localhost kernel: NFS reply statfs: -5
>>>> Jan 25 20:51:14 localhost kernel: nfs_statfs: statfs error = 5
>>>> Jan 25 20:51:14 localhost kernel: NFS call  fsstat
>>>> Most of the trace is repeating in set intervals as well, there is no
>>>> flooding of the logs...
>>>> Fedora would not shutdown normally either
>>>
>>> This verifies that your client is attempting to access the NFS server,
>>> but
>>> doesn't tell us which file it's attempting to access.  Essentially the
>>> EIO
>>> means "failed to connect".
>>>
>>> Maybe try an strace of the nautilus process next?
>>>
>>>> On Mon, Jan 25, 2010 at 5:48 PM, Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> On Jan 24, 2010, at 7:09 PM, Whoop Whouzer wrote:
>>>>>>
>>>>>> I did some network traces and there is nothing strange happening as
>>>>>> far as I can tell. I shut down the server (some network traffic
>>>>>> occurred as is to be expected). It got quiet again, I launched
>>>>>> nautilus, it got stuck without displaying anything and there was no
>>>>>> real network activity except 3 broadcasts using the ARP protocol
>>>>>> asking where the server was (could be just coincidence).
>>>>>
>>>>> That sounds like the client does want to reconnect with the server.
>>>>>
>>>>> You could try enabling debug tracing on your client (sudo rpcdebug -m
>>>>> nfs
>>>>> -s
>>>>> all) after shutting down your server, then try to start nautilus.  The
>>>>> kernel log would then contain NFS-related messages that might indicate
>>>>> where
>>>>> to look next.
>>>>>
>>>>>> Closing
>>>>>> nautilus and launching it again will let it hang again but I see no
>>>>>> additional network traffic. After a while nautilus will display the
>>>>>> contents of the folder without any network traffic.
>>>>>>
>>>>>> On Sun, Jan 24, 2010 at 10:34 PM, Muntz, Daniel <Dan.Muntz@xxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Perhaps something in your $PATH is in the NFS mount?  Do a network
>>>>>>> trace
>>>>>>> and maybe you can see if, in fact, there are actually NFS operations
>>>>>>> being
>>>>>>> attempted that you weren't expecting.  Then try to figure out why.
>>>>>>>
>>>>>>>  -Dan
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Whoop Whouzer [mailto:tiredandnumb@xxxxxxxxx]
>>>>>>>> Sent: Saturday, January 23, 2010 8:28 AM
>>>>>>>> To: Peter Chacko
>>>>>>>> Cc: linux-nfs@xxxxxxxxxxxxxxx
>>>>>>>> Subject: Re: nfs client performance while server is down
>>>>>>>>
>>>>>>>> I don't remember all the different set-ups I tried it on, but I just
>>>>>>>> confirmed this with the following combinations:
>>>>>>>>
>>>>>>>> ubuntu server 10.04 (alpha 2) --> ubuntu desktop 9.10, ubuntu
>>>>>>>> desktop
>>>>>>>> 10.04 (alpha 2), fedora 12
>>>>>>>> ubuntu server 9.10 --> ubuntu desktop 9.10, ubuntu desktop 10.04
>>>>>>>> (alpha 2), fedora 12
>>>>>>>>
>>>>>>>> I'll be happy to test it on another client machine (distro) even
>>>>>>>> another server (although it would require a little more time)
>>>>>>>>
>>>>>>>> Here are some examples on the bugreports I noticed and how they do
>>>>>>>> not
>>>>>>>> seem to get solved:
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=175283
>>>>>>>> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/164120
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> Whoop
>>>>>>>>
>>>>>>>> On Sat, Jan 23, 2010 at 4:57 PM, Peter Chacko
>>>>>>>> <peterchacko35@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Which client OS you observed this behavior ?  This has nothing to
>>>>>>>>> do
>>>>>>>>> NFS design, and its purely stateless...Its upto the client OS
>>>>>>>>> implementation about aspects like how to deal with local
>>>>>>>>
>>>>>>>> IO, when NFS
>>>>>>>>>
>>>>>>>>> share gets  disconnected..
>>>>>>>>>
>>>>>>>>> May be a VFS bug on the local OS you found this problem ..
>>>>>>>>>
>>>>>>>>> thanks
>>>>>>>>>
>>>>>>>>> On Sat, Jan 23, 2010 at 9:15 PM, Whoop Whouzer
>>>>>>>>
>>>>>>>> <tiredandnumb@xxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> Howdy,
>>>>>>>>>>
>>>>>>>>>> I was wondering why nfs is designed in such a way that the
>>>>>>>>
>>>>>>>> performance
>>>>>>>>>>
>>>>>>>>>> of an nfs client machine gets very bad when the nfs server
>>>>>>>>
>>>>>>>> is offline?
>>>>>>>>>>
>>>>>>>>>> This is even the case with a soft mount (either via mount
>>>>>>>>
>>>>>>>> or fstab).
>>>>>>>>>>
>>>>>>>>>> Just about every application that requires disk access (not
>>>>>>>>>> talking
>>>>>>>>>> about nfs share acces) gets really slow to unresponsive.
>>>>>>>>
>>>>>>>> For instance
>>>>>>>>>>
>>>>>>>>>> nautilus becomes unresponsive when displaying the contents of any
>>>>>>>>>> folder on the local disk,
>>>>>>>>>> playing movie files (stored on local disk) let totem or
>>>>>>>>
>>>>>>>> vlc get stuck
>>>>>>>>>>
>>>>>>>>>> on set intervals, even the terminal becomes unresponsive at times.
>>>>>>>>>>
>>>>>>>>>> I could understand that these problems would occur while
>>>>>>>>
>>>>>>>> accessing the
>>>>>>>>>>
>>>>>>>>>> nfs share directoiourry while the server is offline, but
>>>>>>>>
>>>>>>>> why for totally
>>>>>>>>>>
>>>>>>>>>> unrelated directories?
>>>>>>>>>>
>>>>>>>>>> I have experienced this behaviour on various distro's, and
>>>>>>>>
>>>>>>>> also found
>>>>>>>>>>
>>>>>>>>>> various bug reports on this issue, they don't seem to get solved
>>>>>>>>>> as
>>>>>>>>>> this is viewed as nfs design.
>>>>>>>>>> I see this as a flaw because clients are totally dependent on the
>>>>>>>>>> server. This would be less of a deal if the entire home directory
>>>>>>>>>> would be stored on nfs (although I even think some sort of
>>>>>>>>>> synchronisation technology could and should be implemented in this
>>>>>>>>>> case). It is a bit odd that (technically) one machine serving some
>>>>>>>>>> "useless" files to a non-trivial directory on client
>>>>>>>>
>>>>>>>> machines can take
>>>>>>>>>>
>>>>>>>>>> down these client machines.
>>>>>>>>>>
>>>>>>>>>> For me the preferred functionality would be:
>>>>>>>>>> *If an nfs server gets offline the client's nfs share becomes
>>>>>>>>>> unaccessible, but local directories and applications (that only
>>>>>>>>>> require local disk access) stay responsive.
>>>>>>>>>> *If an nfs server gets online (after being offline while the
>>>>>>>>>> client
>>>>>>>>>> has not been restarted) the nfs share becomes reconnected.
>>>>>>>>>>
>>>>>>>>>> regards,
>>>>>>>>>> Whoop
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>
>>>>>>>> linux-nfs" in
>>>>>>>>>>
>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-nfs" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>>>>>> in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> --
>>>>> Chuck Lever
>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>>
>> <stracesdiff.log>
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux