Re: GIT get corrupted on lustre

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[I forgot to subscribe to the git mailing list, sorry for that]

On 01/22/2013 05:14 PM, Thomas Rast wrote:
Eric Chamberland <Eric.Chamberland@xxxxxxxxxxxxxxx> writes:

So, hum, do we have some sort of conclusion?

Shall it be a fix for git to get around that lustre "behavior"?

If something can be done in git it would be great: it is a *lot*
easier to change git than the lustre filesystem software for a cluster
in running in production mode... (words from cluster team) :-/

I thought you already established that simply disabling the progress
display is a sufficient workaround?  If that doesn't help, you can try
patching out all use of SIGALRM within git.


In git (9591fcc6d66), I have found these SIGALRM signal handling:

builtin/log.c:268:    sigaction(SIGALRM, &sa, NULL);
builtin/log.c:285:    signal(SIGALRM, SIG_IGN);
compat/mingw.c:1590:        mingw_raise(SIGALRM);
compat/mingw.c:1666:    if (sig != SIGALRM)
compat/mingw.c:1668:            error("sigaction only implemented for SIGALRM");
compat/mingw.c:1683:    case SIGALRM:
compat/mingw.c:1702:    case SIGALRM:
compat/mingw.c:1706:            exit(128 + SIGALRM);
compat/mingw.c:1708:            timer_fn(SIGALRM);
compat/mingw.h:42:#define SIGALRM 14
perl/Git/SVN.pm:2121:            SIGALRM, SIGUSR1, SIGUSR2);
progress.c:56:    sigaction(SIGALRM, &sa, NULL);
progress.c:68:    signal(SIGALRM, SIG_IGN);


I suppose that compat/mingw.{h,c} and SVN.pm can be ignored as our patch to work
around this problem won't be pushed upstream because the real problem is not in git, right ?

If I understand correctly, some VFS system calls get interrupted by SIGALRM, but when
they resume (via SA_RESTART) they return EINTR. Thomas said that these failed calls may need to be retried,
but that open(O_CREAT|O_EXCL) is still tricky around this case.


progress.c SIGALRM code paths are for progress and therefore are required, right ?

builtin/log.c SIGALRM code paths are for early output, and the comments in the code say that

   "If we can get the whole output in less than a tenth of a second, don't even bother doing the
    early-output thing."


So where do I start for the patch ?

Other than that I agree with Junio, from what we've seen so far, Lustre
returns EINTR on all sorts of calls that simply aren't allowed to do so.



--
---
Spécialiste en granularité (1 journée / semaine)
Calcul Québec / Calcul Canada
Pavillon Adrien-Pouliot, Université Laval, Québec (Québec), Canada
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]