On Fri, Aug 12, 2022 at 6:58 PM Frank Dinoff <fdinoff@xxxxxxxxxx> wrote: > > On Fri, Aug 12, 2022 at 5:33 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > On Thu, 11 Aug 2022 at 23:05, Frank Dinoff <fdinoff@xxxxxxxxxx> wrote: > > > > > > I have a binary running on a fuse filesystem which is generating a zip file. I > > > don't know what syscalls are involved since the binary segfaults when run with > > > strace. > > > > You could strace the fuse filesystem. > > I'll try doing this later, I was unsuccessful in finding anything > useful printing large amounts > of debug logs. I got strace working on the program. It looks like it doing something like open(O_RDWR) = 9 multiple write(...) calls such that the lseek below is before end of file. lseek(9, 2514944, SEEK_SET) = 2514944 read(9, "", 8192) = 0 // Should have read 5770 bytes lseek(9, 5770, SEEK_CUR) = 2520714 // should be end. write(...) close(9) open(O_RDWR) = 9 lseek(9, 2514944, SEEK_SET) = 2514944 read(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6042) = 6042 ... The first read doesn't return data and I'm not sure why. It is kinda like the kernel page cache has gotten out of sync and thinks the whole file should be zeros. > > > > > > After doing a binary search, > > > https://github.com/torvalds/linux/commit/fa5eee57e33e79b71b40e6950c29cc46f5cc5cb7 > > > is the commit that seems to have introduced the error. It still seems to > > > failing with a much newer kernel. > > > > How is it failing? > > Oops sorry I thought I included that. You can't unzip the file. > unzip -t has "error: invalid compressed data to inflate" > > > > Reverting the fuse_invalidate_attr_mask in fuse_perform_write to > > > fuse_invalidate_attr makes every other run of the binary produce the correct > > > output. > > > > What do you mean? Is it succeeding half the time? > > Running the binary multiple times in a row about 50% produce the > correct file and 50% > produce a corrupt file. > > Running the test multiple times before fa5eee57 I'm seeing about 10% > of runs producing > a corrupt file. (I did not realize this had a chance of failure on the > old kernel.) > After fa5eee57 I have 100% of runs producing the corrupt file. > > > > > > > > > I found that enabling the writeback cache makes the binary always produce the > > > right output. Running the fuse daemon in single threaded mode also works. > > > > > > Is there anything that sticks out to you that is wrong with the above commit? > > > > Could you try adding STATX_MODE to the invalidated mask? Can't > > imagine any other attribute being relevant. > > Adding STATX_MODE to FUSE_STATX_MODIFY does make the binary produce the > correct file about 75% of the time. The last bit of flakiness may be > some concurrency > issue in the binary? > > > > > Thanks, > > Miklos