Re: CIFS lockup regression on SMB1 in 6.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-08-15 23:31, Steve French wrote:
What is the simplest repro you have seen - e.g. is there a git tree
with very small source that fails with configure that you could share?

On Thu, Aug 15, 2024 at 4:22 PM matoro
<matoro_mailinglist_kernel@xxxxxxxxx> wrote:

On 2024-08-15 15:37, Steve French wrote:
> Do you have any data on whether this still fails with current Linux
> kernel (6.11-rc3 e.g.)?
>
>
> On Thu, Aug 15, 2024 at 1:08 PM matoro
> <matoro_mailinglist_kernel@xxxxxxxxx> wrote:
>>
>> Hi all, I run a service where user home directories are mounted over SMB1
>> with unix extensions.  After upgrading to kernel 6.10 it was reported to me
>> that users were observing lockups when performing compilations in their
>> home
>> directories.  I investigated and confirmed this to be the case.  It would
>> cause the build processes to get stuck in I/O.  After the lockup triggered
>> then all further reads/writes to the CIFS-mounted directory would get
>> stuck.
>> Even the df(1) command would block indefinitely.  Shutdown was also
>> prevented
>> as the directory could no longer be unmounted.
>>
>> Triggering the issue is a little bit tricky.  I used compiling cpython as a
>> test case.  Parallel compilation does not seem to be required to trigger
>> it,
>> because in some tests the hang would occur during ./configure phase, but it
>> does seem to provoke it more easily, as the most common point where the
>> lockup was observed was immediately after "make -j4".  However, sometimes
>> it
>> would take 10+ minutes of ongoing compilation before the lockup would
>> trigger.  I never observed a complete successful compilation on kernel
>> 6.10.
>>
>> The furthest back I was able to confirm that the lockup is observed was
>> v6.10-rc3.  The furthest forward I was able to confirm is good was v6.9.9
>> in
>> the stable tree.  Unfortunately, between those two tags there seems to be a
>> wide range of commits where the CIFS functionality is completely broken,
>> and
>> reads/writes return total nonsense results.  For example, any git commands
>> return "git error: bad signature 0x00000000".  So I cannot execute a
>> compilation on commits in this range in order to test whether they observe
>> the lockup issue.  Therefore I wasn't able to test most of the range, and
>> wasn't able to complete a traditional bisect.  I tried adjusting the
>> read/write buffers down to 8192 from the defaults, but this did not help.
>> I
>> also tried toggling several options that might be related, namely
>> CONFIG_FSCACHE, to no effect.  There are no logs emitted to dmesg when the
>> lockup occurs.
>>
>> Thanks - please let me know if there is any further information I can
>> provide.  For now I am rolling all hosts back to kernel 6.9.
>>
>
>
> --
> Thanks,
>
> Steve

Hi Steve, just tested. Not only is it still there in 6.11-rc3, but it's much
worse - I got an immediate lockup just from ./configure

Thank you for looking at this.

I've been using the cpython source to test, https://github.com/python/cpython. Just a plain ./configure and make -j4. But it seems to affect any substantial build process, I was also able to trigger it with coreutils build, really anything that generates I/O load.

Here's what my effective mount options look like:
type cifs (rw,nosuid,relatime,vers=1.0,cache=strict,username=nobody,uid=30000,forceuid,gid=30000,forcegid,addr=fd05:0000:0000:0000:0000:0000:0000:0001,soft,unix,posixpaths,serverino,mapposix,acl,reparse=nfs,rsize=1048576,wsize=65536,bsize=1048576,retrans=1,echo_interval=60,actimeo=1,closetimeo=1)




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux