On Aug 04 2017, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > On Fri, Aug 4, 2017 at 9:10 PM, Nikolaus Rath <Nikolaus@xxxxxxxx> wrote: >> Hello, >> >> I am confused about how O_APPEND is supposed to interact with the >> writeback cache. >> >> As far as I can tell, the O_APPEND flag is currently passed to the >> filesystem process, so my expectation is that the filesystem process is >> responsible for ignoring any offset in write requests and instead write >> at the current end of the file[1]. >> >> However, with writeback cache enabled the filesystem process cannot tell >> which data is "new" and came from userspace, should be appended, and >> which data is old and just made a round-trip to the kernel. So it seems >> to me that the filesystem process should probably leave the handling of >> O_APPEND to the kernel. But then, shouldn't the kernel filter out this >> flag when sending the open request? > > Indeed, when writing back the cache the kernel should definitely not > set O_APPEND. Well, 4.9 certainly does it though. Should I try to make a patch, or are you or Maxim going to do that shortly anyway? Do you think it makes sense to filter out O_APPEND in libfuse as well (to work around the issue for present day kernels)? >> On the other hand, when the kernel handles O_APPEND, then it is no >> longer atomic (think of a network fuse filesystem). > > Yes, network filesystem generally needs to handle consistency of > caches across nodes and O_APPEND in no exception (i.e. you cannot have > two nodes writing O_APPEND to cache at the same time, because that > will not work). This poses a bit of a problem though. So a network filesystem either cannot use writeback caching or O_APPEND will (silently) not work. With the current behavior (O_APPEND being passed to open() when writeback is enabled) the filesystem would at least have a chance to return an error, i.e. instead of a silent failure there would be a noisy error. With that in mind, maybe the current behavior isn't so bad? We'd just have to document that if writeback cache is enabled and O_APPEND is received, the filesystem has to decide if it is fine with the kernel handling O_APPEND (and in that case ignore the flag for subsequent writes) or return an error. Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«