CIFS lockup regression on SMB1 in 6.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all, I run a service where user home directories are mounted over SMB1 with unix extensions. After upgrading to kernel 6.10 it was reported to me that users were observing lockups when performing compilations in their home directories. I investigated and confirmed this to be the case. It would cause the build processes to get stuck in I/O. After the lockup triggered then all further reads/writes to the CIFS-mounted directory would get stuck. Even the df(1) command would block indefinitely. Shutdown was also prevented as the directory could no longer be unmounted.

Triggering the issue is a little bit tricky. I used compiling cpython as a test case. Parallel compilation does not seem to be required to trigger it, because in some tests the hang would occur during ./configure phase, but it does seem to provoke it more easily, as the most common point where the lockup was observed was immediately after "make -j4". However, sometimes it would take 10+ minutes of ongoing compilation before the lockup would trigger. I never observed a complete successful compilation on kernel 6.10.

The furthest back I was able to confirm that the lockup is observed was v6.10-rc3. The furthest forward I was able to confirm is good was v6.9.9 in the stable tree. Unfortunately, between those two tags there seems to be a wide range of commits where the CIFS functionality is completely broken, and reads/writes return total nonsense results. For example, any git commands return "git error: bad signature 0x00000000". So I cannot execute a compilation on commits in this range in order to test whether they observe the lockup issue. Therefore I wasn't able to test most of the range, and wasn't able to complete a traditional bisect. I tried adjusting the read/write buffers down to 8192 from the defaults, but this did not help. I also tried toggling several options that might be related, namely CONFIG_FSCACHE, to no effect. There are no logs emitted to dmesg when the lockup occurs.

Thanks - please let me know if there is any further information I can provide. For now I am rolling all hosts back to kernel 6.9.




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux