Re: CIFS lockup regression on SMB1 in 6.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What is the simplest repro you have seen - e.g. is there a git tree
with very small source that fails with configure that you could share?

On Thu, Aug 15, 2024 at 4:22 PM matoro
<matoro_mailinglist_kernel@xxxxxxxxx> wrote:
>
> On 2024-08-15 15:37, Steve French wrote:
> > Do you have any data on whether this still fails with current Linux
> > kernel (6.11-rc3 e.g.)?
> >
> >
> > On Thu, Aug 15, 2024 at 1:08 PM matoro
> > <matoro_mailinglist_kernel@xxxxxxxxx> wrote:
> >>
> >> Hi all, I run a service where user home directories are mounted over SMB1
> >> with unix extensions.  After upgrading to kernel 6.10 it was reported to me
> >> that users were observing lockups when performing compilations in their
> >> home
> >> directories.  I investigated and confirmed this to be the case.  It would
> >> cause the build processes to get stuck in I/O.  After the lockup triggered
> >> then all further reads/writes to the CIFS-mounted directory would get
> >> stuck.
> >> Even the df(1) command would block indefinitely.  Shutdown was also
> >> prevented
> >> as the directory could no longer be unmounted.
> >>
> >> Triggering the issue is a little bit tricky.  I used compiling cpython as a
> >> test case.  Parallel compilation does not seem to be required to trigger
> >> it,
> >> because in some tests the hang would occur during ./configure phase, but it
> >> does seem to provoke it more easily, as the most common point where the
> >> lockup was observed was immediately after "make -j4".  However, sometimes
> >> it
> >> would take 10+ minutes of ongoing compilation before the lockup would
> >> trigger.  I never observed a complete successful compilation on kernel
> >> 6.10.
> >>
> >> The furthest back I was able to confirm that the lockup is observed was
> >> v6.10-rc3.  The furthest forward I was able to confirm is good was v6.9.9
> >> in
> >> the stable tree.  Unfortunately, between those two tags there seems to be a
> >> wide range of commits where the CIFS functionality is completely broken,
> >> and
> >> reads/writes return total nonsense results.  For example, any git commands
> >> return "git error: bad signature 0x00000000".  So I cannot execute a
> >> compilation on commits in this range in order to test whether they observe
> >> the lockup issue.  Therefore I wasn't able to test most of the range, and
> >> wasn't able to complete a traditional bisect.  I tried adjusting the
> >> read/write buffers down to 8192 from the defaults, but this did not help.
> >> I
> >> also tried toggling several options that might be related, namely
> >> CONFIG_FSCACHE, to no effect.  There are no logs emitted to dmesg when the
> >> lockup occurs.
> >>
> >> Thanks - please let me know if there is any further information I can
> >> provide.  For now I am rolling all hosts back to kernel 6.9.
> >>
> >
> >
> > --
> > Thanks,
> >
> > Steve
>
> Hi Steve, just tested.  Not only is it still there in 6.11-rc3, but it's much
> worse - I got an immediate lockup just from ./configure
>
> Thank you for looking at this.



-- 
Thanks,

Steve





[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux