On Sun, Dec 13, 2020 at 5:52 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sun, Dec 13, 2020 at 04:45:39PM -0800, Linus Torvalds wrote: > > On Sun, Dec 13, 2020 at 2:50 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > > > > > Only O_CREAT | O_TRUNC should matter, since those are the ones that > > > > > cause writes as part of the *open*. > > > > > > And __O_TMPFILE, which is the same as O_CREAT. > > > > This made me go look at the code, but we seem to be ok here - > > __O_TMPFILE should never get to the do_open() logic at all, because it > > gets caught before that and does off to do_tmpfile() and then > > vfs_tmpfile() instead. > > > > And then it's up to the filesystem to do the inode locking if it needs > > to - it has a separate i_io->tempfile function for that. > > Sure, and then it blocks. Yes. I was more just double-checking that currently really odd if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) { condition that didn't make sense to me (it basically does two different kinds of writablity checks). So it was more that you pointed out that __O_TMPFILE was also missing from that odd condition, and that turns out to be because it was handled separately. So no disagreement about __O_TMPFILE being a "not a cached operation" - purely a "that condition is odd". It was just that O_WRONLY | O_RDWR didn't make tons of sense to me, since we then get the write count only to then drop it immediately immediately without having actually done any writes. But I guess they are there only as a "even if we don't write to the filesystem right now, we do want to get the EROFS error return from open(), rather than at write() time". I think technically it shouldn't need to do the pointless "synchronize and increment writers only to decrement them again" dance, and could just do a "mnt_is_readonly()" test for the plain "open writably, but without O_CREAT/O_TRUNC" case. But I guess there's no real downside to doing it the way we're doing it - it just looked odd to me when I was looking at it in the context of just pathname lookup. In do_faccessat() we have that bare __mnt_is_readonly() for the "I want EROFS, but I'm not actually writing now" case. Linus