On Wed, Jan 26, 2022 at 2:02 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, Jan 26, 2022 at 09:05:48AM +1100, Daniel Black wrote: > > O_RDONLY is defined to be 0, so don't worry about it. Thanks. > > The kernel code in setfl seems to want to return EINVAL for > > filesystems without a direct_IO structure member assigned, > > > > A noop_direct_IO seems to be used frequently to just return EINVAL > > (like cifs_direct_io). > > Sorry for the confusion. You've caught us mid-transition. Eventually, > ->direct_IO will be deleted, but for now it signifies whether or not the > filesystem supports O_DIRECT, even though it's not used (except in some > scenarios you don't care about). Is it going to be reasonable to expect fcntl(fd, F_SETFL, O_DIRECT) to return EINVAL if O_DIRECT isn't supported? > > Lastly on the list of peculiar behaviors here, is tmpfs will return > > EINVAL from the fcntl call however it works fine with O_DIRECT > > (https://bugs.mysql.com/bug.php?id=26662). MySQL (and MariaDB still > > has the same code) that currently ignores EINVAL, but I'm willing to > > make that code better. > > Out of interest, what behaviour do you _want_ from doing O_DIRECT > to tmpfs? O_DIRECT is defined to bypass the page cache, but tmpfs > only stores data in the page cache. So what do you intend to happen? It occurs to me because EINVAL is returned, it's just operating in non-O_DIRECT mode. It occurs to me that someone probably added this because (too much) MySQL/MariaDB testing is done on tmpfs and someone didn't want to adjust the test suite to handle failures everywhere on O_DIRECT. I don't think there was any kernel expectation there. My problem it seems, I'll see what I can do to get back to using real filesystems more. > > Does a userspace have to fully try to write to an O_DIRECT file, note > > the failure, reopen or clear O_DIRECT, and resubmit to use O_DIRECT? > > > > While I see that the success/failure of a O_DIRECT read/write can be > > related to the capabilities of the underlying block device depending > > on offset/length of the read/write, are there other traps? > > It also must be aligned in memory, yep, knew this one. > but I'm not quite sure what > limitations cifs imposes.