On 2024-08-15 23:31, Steve French wrote:
What is the simplest repro you have seen - e.g. is there a git tree
with very small source that fails with configure that you could share?
On Thu, Aug 15, 2024 at 4:22 PM matoro
<matoro_mailinglist_kernel@xxxxxxxxx> wrote:
On 2024-08-15 15:37, Steve French wrote:
> Do you have any data on whether this still fails with current Linux
> kernel (6.11-rc3 e.g.)?
>
>
> On Thu, Aug 15, 2024 at 1:08 PM matoro
> <matoro_mailinglist_kernel@xxxxxxxxx> wrote:
>>
>> Hi all, I run a service where user home directories are mounted over SMB1
>> with unix extensions. After upgrading to kernel 6.10 it was reported to me
>> that users were observing lockups when performing compilations in their
>> home
>> directories. I investigated and confirmed this to be the case. It would
>> cause the build processes to get stuck in I/O. After the lockup triggered
>> then all further reads/writes to the CIFS-mounted directory would get
>> stuck.
>> Even the df(1) command would block indefinitely. Shutdown was also
>> prevented
>> as the directory could no longer be unmounted.
>>
>> Triggering the issue is a little bit tricky. I used compiling cpython as a
>> test case. Parallel compilation does not seem to be required to trigger
>> it,
>> because in some tests the hang would occur during ./configure phase, but it
>> does seem to provoke it more easily, as the most common point where the
>> lockup was observed was immediately after "make -j4". However, sometimes
>> it
>> would take 10+ minutes of ongoing compilation before the lockup would
>> trigger. I never observed a complete successful compilation on kernel
>> 6.10.
>>
>> The furthest back I was able to confirm that the lockup is observed was
>> v6.10-rc3. The furthest forward I was able to confirm is good was v6.9.9
>> in
>> the stable tree. Unfortunately, between those two tags there seems to be a
>> wide range of commits where the CIFS functionality is completely broken,
>> and
>> reads/writes return total nonsense results. For example, any git commands
>> return "git error: bad signature 0x00000000". So I cannot execute a
>> compilation on commits in this range in order to test whether they observe
>> the lockup issue. Therefore I wasn't able to test most of the range, and
>> wasn't able to complete a traditional bisect. I tried adjusting the
>> read/write buffers down to 8192 from the defaults, but this did not help.
>> I
>> also tried toggling several options that might be related, namely
>> CONFIG_FSCACHE, to no effect. There are no logs emitted to dmesg when the
>> lockup occurs.
>>
>> Thanks - please let me know if there is any further information I can
>> provide. For now I am rolling all hosts back to kernel 6.9.
>>
>
>
> --
> Thanks,
>
> Steve
Hi Steve, just tested. Not only is it still there in 6.11-rc3, but it's
much
worse - I got an immediate lockup just from ./configure
Thank you for looking at this.
I've been using the cpython source to test,
https://github.com/python/cpython. Just a plain ./configure and make -j4.
But it seems to affect any substantial build process, I was also able to
trigger it with coreutils build, really anything that generates I/O load.
Here's what my effective mount options look like:
type cifs
(rw,nosuid,relatime,vers=1.0,cache=strict,username=nobody,uid=30000,forceuid,gid=30000,forcegid,addr=fd05:0000:0000:0000:0000:0000:0000:0001,soft,unix,posixpaths,serverino,mapposix,acl,reparse=nfs,rsize=1048576,wsize=65536,bsize=1048576,retrans=1,echo_interval=60,actimeo=1,closetimeo=1)