Re: reg. fatal: The remote end hung up unexpectedly on NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 06, 2018 at 11:55:51PM +0530, Satya Prakash GS wrote:

> We have a distributed filesystem with NFS access. On the NFS mount, I
> was doing a git-clone and if NFS server crashed and came back up while
> the clone is going on, clone fails with the below message:
> 
> git clone https://satgs@xxxxxxxxxx/fs/private-qa.git
> 
> remote: Counting objects: 139419, done.
> remote: Compressing objects: 100% (504/504), done.
> Receiving objects:   7% (9760/139419), 5.32 MiB | 5.27 MiB/s
> error: RPC failed; result=18, HTTP code = 200 MiB | 96.00 KiB/s
> fatal: The remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed

Curl's result=18 is CURLE_PARTIAL_FILE. Usually that means the other
side hung up partway through. But given the NFS symptoms you describe, I
wonder if fwrite() to the file simply returned an error, and curl gave
up.

> On NFS server crash, it usually takes a minute or two for our
> filesystem to failover to new NFS server. Initially I suspected it had
> something to do with the filesystem, like attributes of the file
> written by git weren't matching what it was expecting but the same
> test fails on open source NFS server. While clone is going on, if I
> stopped the open source NFS server for 2 minutes and restarted it, git
> clone fails.
> 
> Another interesting thing is, if the restart happens within a few
> seconds, git clone succeeds.
> 
> Sideband_demux fails while trying to read from the pipe. Read size
> doesn't match what is expected. If there are 2 parts to git clone
> which is fetching data and writing to local filesystem, is this error
> happening while trying to fetch ? Since it succeeds if the restart is
> done immediately, has this got something to do with the protocol
> timeouts.
> 
> Please advise on how to debug this further.

If you're on Linux, strace could show you the write error. Unfortunately
it's a little tricky because the http bits happen in a sub-process. But
try:

  cat >/in/your/$PATH/git-remote-strace <<\EOF
  #!/bin/sh
  protocol=$(echo "$2" | cut -d: -f1)
  exec strace -f -o /tmp/foo.out git-remote-$protocol "$@"
  EOF
  chmod +x /in/your/$PATH/git-remote-strace

  git clone strace::https://github.com/whatever

My guess is you may find a failed `write()` in there.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux