Re: Interrupt on readdirplus?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/10/24 19:38, Han-Wen Nienhuys wrote:
> On Tue, Sep 10, 2024 at 5:58 PM Bernd Schubert
> <bernd.schubert@xxxxxxxxxxx> wrote:
>>> I have noticed Go-FUSE test failures of late, that seem to originate
>>> in (changed?) kernel behavior. The problems looks like this:
>>>
>>> 12:54:13.385435 rx 20: OPENDIR n1  p330882
>>> 12:54:13.385514 tx 20:     OK, {Fh 1 }
>>> 12:54:13.385838 rx 22: READDIRPLUS n1 {Fh 1 [0 +4096)  L 0 LARGEFILE}  p330882
>>> 12:54:13.385844 rx 23: INTERRUPT n0 {ix 22}  p0
>>> 12:54:13.386114 tx 22:     OK,  4000b data "\x02\x00\x00\x00\x00\x00\x00\x00"...
>>> 12:54:13.386642 rx 24: READDIRPLUS n1 {Fh 1 [1 +4096)  L 0 LARGEFILE}  p330882
>>> 12:54:13.386849 tx 24:     95=operation not supported
>>>
>>> As you can see, the kernel attempts to interrupt the READDIRPLUS
>>
>> do you where the interrupt comes from? Is your test interrupting
>> interrupting readdir?
> 
> I did not write code to issue interrupts, but it is possible that the
> Go runtime does something behind my back. The debug output lists "p0";
> is there a reason that the INTERRUPT opcode does not provide an
> originating PID ? How would I discover who or what is generating the
> interrupts? Will they show up if I run strace on the test binary?
> 
> I straced the test binary, below is a section where an interrupt
> happens just before a directory seek. Could tgkill(SIGURG) cause an
> interrupt? It happens just before the READDIRPLUS op (',') and the
> INTERRUPT operation ('$') are read.
> 
> [pid 371933] writev(10,
> [{iov_base="\260\17\0\0\0\0\0\0\20\0\0\0\0\0\0\0", iov_len=16},
> {iov_base="\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\
> 0\0\0"..., iov_len=4000}], 2 <unfinished ...>
> [pid 371931] <... nanosleep resumed>NULL) = 0
> [pid 371933] <... writev resumed>)      = 4016
> [pid 371931] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
> [pid 371933] read(10,  <unfinished ...>
> [pid 371934] <... getdents64 resumed>0xc0002ee000 /* 25 entries */, 8192) = 800
> [pid 371931] <... nanosleep resumed>NULL) = 0
> [pid 371934] futex(0xc000059148, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> [pid 371931] getpid( <unfinished ...>
> [pid 371934] <... futex resumed>)       = 1
> [pid 371932] <... futex resumed>)       = 0
> [pid 371931] <... getpid resumed>)      = 371930
> [pid 371934] getdents64(7,  <unfinished ...>
> [pid 371932] futex(0xc000059148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
> [pid 371931] tgkill(371930, 371934, SIGURG <unfinished ...>
> [pid 371935] <... read
> resumed>"P\0\0\0,\0\0\0\22\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 131200) = 80
>     # , = readdirplus
> [pid 371933] <... read
> resumed>"0\0\0\0$\0\0\0\23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 131200) = 48
>     # $ = interrupt.
> [pid 371931] <... tgkill resumed>)      = 0
> [pid 371933] futex(0xc000059148, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> [pid 371931] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
> [pid 371933] <... futex resumed>)       = 1
> [pid 371932] <... futex resumed>)       = 0
> [pid 371933] write(2, "19:24:40.813294 doInterrupt\n", 28 <unfinished ...>
> 19:24:40.813294 doInterrupt
> 
> A bit of browsing through the Go source code suggests that SIGURG is
> used to preempt long-running goroutines, so it could be issued more or
> less at random.
> 
> Nevertheless, FUSE should also not be reissuing the reads, even if
> there were interrupts, right?
> 
>>> operation, but go-fuse ignores the interrupt and returns 25 entries.
>>> The kernel somehow thinks that only 1 entry was consumed, and issues
>>> the next READDIRPLUS at offset 1. If go-fuse ignores the faulty offset
>>> and continues the listing (ie. continuing with entry 25), the test
>>> passes.
>>>
>>> Is this behavior of the kernel expected or a bug?
>>>
>>> I am redoing the API for directory listing to support cacheable and
>>> seekable directories, and in the new version, this looks like a
>>> directory seek. If the file system does not support seekable
>>> directories, I must return some kind of error (which is the ENOTSUP
>>> you can see in the log above).
>>
>> Is this with or without FOPEN_CACHE_DIR? Would be helpful to know
>> if FOPEN_CACHE_DIR - fuse kernel code is quite different when this
>> is set.
> 
> without FOPEN_CACHE_DIR.

So we should down to this function

https://elixir.bootlin.com/linux/v6.10.9/source/fs/fuse/readdir.c#L286

I still don't understand what is actually the failure in the test, does 
it show entries multiple times (I wouldn't understand how that could 
happen)? From the code, I would expect that you get missing entries if
you ignore the offset.

I'm not sure I ever looked into filldir64 before, but I find it
interesting that a pending signal is only checked when prev_reclen is
set.

https://elixir.bootlin.com/linux/v6.10.9/source/fs/readdir.c#L399

I.e. it accepts the first entry and then fails? parse_dirplusfile()
has to go over the entire list in the buffer, as you have a lookup
count on it. But it shouldn't list these entries - which is why
I'm confused what is the actual issue you are seeing

> 
>>> I started seeing this after upgrading to Fedora 40. My kernel is
>>> 6.10.7-200.fc40.x86_64
>>>
>>
>> Would be interesting to know your kernel version before? There is
>> commit cdf6ac2a03d2, which removes a readdir lock. Although the
>> commit message explains why it is not needed anymore.
> 
> The failure happens in single threaded loads, so a race condition
> seems unlikely.
> 
> Let me try with other kernel versions.
> 
> Also, I realize that Fedora 40 also upgraded the Go compiler, and may
> have made SIGURG be triggered more frequently?
> 


Thanks,
Bernd




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux