What is the simplest repro you have seen - e.g. is there a git tree with very small source that fails with configure that you could share? On Thu, Aug 15, 2024 at 4:22 PM matoro <matoro_mailinglist_kernel@xxxxxxxxx> wrote: > > On 2024-08-15 15:37, Steve French wrote: > > Do you have any data on whether this still fails with current Linux > > kernel (6.11-rc3 e.g.)? > > > > > > On Thu, Aug 15, 2024 at 1:08 PM matoro > > <matoro_mailinglist_kernel@xxxxxxxxx> wrote: > >> > >> Hi all, I run a service where user home directories are mounted over SMB1 > >> with unix extensions. After upgrading to kernel 6.10 it was reported to me > >> that users were observing lockups when performing compilations in their > >> home > >> directories. I investigated and confirmed this to be the case. It would > >> cause the build processes to get stuck in I/O. After the lockup triggered > >> then all further reads/writes to the CIFS-mounted directory would get > >> stuck. > >> Even the df(1) command would block indefinitely. Shutdown was also > >> prevented > >> as the directory could no longer be unmounted. > >> > >> Triggering the issue is a little bit tricky. I used compiling cpython as a > >> test case. Parallel compilation does not seem to be required to trigger > >> it, > >> because in some tests the hang would occur during ./configure phase, but it > >> does seem to provoke it more easily, as the most common point where the > >> lockup was observed was immediately after "make -j4". However, sometimes > >> it > >> would take 10+ minutes of ongoing compilation before the lockup would > >> trigger. I never observed a complete successful compilation on kernel > >> 6.10. > >> > >> The furthest back I was able to confirm that the lockup is observed was > >> v6.10-rc3. The furthest forward I was able to confirm is good was v6.9.9 > >> in > >> the stable tree. Unfortunately, between those two tags there seems to be a > >> wide range of commits where the CIFS functionality is completely broken, > >> and > >> reads/writes return total nonsense results. For example, any git commands > >> return "git error: bad signature 0x00000000". So I cannot execute a > >> compilation on commits in this range in order to test whether they observe > >> the lockup issue. Therefore I wasn't able to test most of the range, and > >> wasn't able to complete a traditional bisect. I tried adjusting the > >> read/write buffers down to 8192 from the defaults, but this did not help. > >> I > >> also tried toggling several options that might be related, namely > >> CONFIG_FSCACHE, to no effect. There are no logs emitted to dmesg when the > >> lockup occurs. > >> > >> Thanks - please let me know if there is any further information I can > >> provide. For now I am rolling all hosts back to kernel 6.9. > >> > > > > > > -- > > Thanks, > > > > Steve > > Hi Steve, just tested. Not only is it still there in 6.11-rc3, but it's much > worse - I got an immediate lockup just from ./configure > > Thank you for looking at this. -- Thanks, Steve