On Jan 11 2019, Nikolaus Rath <Nikolaus@xxxxxxxx> wrote: > On Jan 09 2019, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >> On Tue, Jan 8, 2019 at 11:35 AM Nikolaus Rath <Nikolaus@xxxxxxxx> wrote: >>> >>> On Jan 08 2019, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >>> > On Mon, Jan 7, 2019 at 10:05 PM Nikolaus Rath <Nikolaus@xxxxxxxx> wrote: >>> >> >>> >> On Jan 07 2019, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >>> >> > On Wed, Dec 26, 2018 at 10:44 PM Nikolaus Rath <Nikolaus@xxxxxxxx> wrote: >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> I am seeing relatively regular occurences of >>> >> >> >>> >> >> $ sudo dmesg | tail >>> >> >> [21929.138815] fuse: trying to steal weird page >>> >> >> [21929.138821] page=00000000a7dd2617 index=64 flags=17fffc0000000ad, >>> >> >> count=1, mapcount=0, mapping= (null) >>> >> >> [21930.647338] fuse: trying to steal weird page >>> >> >> [21930.647345] page=00000000a07f32af index=2848 >>> >> >> flags=17fffc0000000ad, count=1, mapcount=0, mapping= (null) >>> >> >> [21932.338873] fuse: trying to steal weird page >>> >> >> [21932.338879] page=0000000067e3a012 index=64 flags=17fffc0000000ad, >>> >> >> count=1, mapcount=0, mapping= (null) >>> >> >> [21933.930703] fuse: trying to steal weird page >>> >> >> [21933.930710] page=00000000046feb25 index=845 >>> >> >> flags=17fffc0000000ad, count=1, mapcount=0, mapping= (null) >>> >> >> [21936.163174] fuse: trying to steal weird page >>> >> >> [21936.163180] page=00000000fb80fe27 index=0 flags=17fffc0000000ad, >>> >> >> count=1, mapcount=0, mapping= (null) >>> >> > >>> >> > The page has the PG_dity and PG_waiters flags set which are >>> >> > incompatible with stealing. page_cache_pipe_buf_steal() does >>> >> > apparently filter out dirty ones, so it's not a regular file that we >>> >> > are trying to streal the page from. So the question is: what is the >>> >> > source of the splice()? >>> >> >>> >> Hmm. I think it has to be a regular file. But as I mentioned in my other >>> >> email, I did have a race condition where fd's were closed >>> >> incorrectly. Is it possible that this also triggered the above, >>> >> i.e. that the fd was closed sometime during splice? >>> > >>> > Close during a syscall that uses the fd is not an issue, because a ref >>> > to the file is acquired. So the race is between the close() and the >>> > internal fget(); if the close() wins then fget() will fail and the >>> > syscall will return EBADF. If the fget() wins, then the syscall can >>> > run normally despite the fact that the fd was closed. >>> > >>> > Can you tell me what filesystem is the regular file (the one being >>> > spliced into fuse) is on? >>> >>> It's ext4. >> >> Next question: is file opened with O_DIRECT or is filesystem mounted >> with DAX, or anything fancy? > > Neither. But thinking about this, I guess that (because of the race) the > fd could have been closed and re-opened before the ref was acquired. So > it may have turned into a directory fd. > > To be honest, I don't think it's worth investigating this unless I see > it happen again now that the race in my code is fixed. Bad news. I can now reliably reproduce the issue again. I have no clue why it disappeared for a while. This is exactly the same filesystem code, but I can't rule out that there has been some routine upgrade in the base system (compiler or kernel). Any suggestions for debugging this? As I said before, splice() source is a regular file on ext4, opened without O_DIRECT or DAX. If I disable FUSE_CAP_SPLICE_WRITE, the error message no longer occurs. Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«