Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Thomas Munro <thomas.munro@xxxxxxxxxxxxxxxx> · Mon, 18 Feb 2019 15:34:48 +1300

On Mon, Feb 18, 2019 at 2:19 PM Michael Paquier <michael@xxxxxxxxxxx> wrote:
> On Sun, Feb 17, 2019 at 10:54:54AM -0800, Andres Freund wrote:
> > On 2019-02-17 23:29:09 +1300, Thomas Munro wrote:
> >> Hmm.  Well, at least ENOSPC should be treated the same way as EIO.
> >> Here's an experiment that seems to confirm some speculations about NFS
> >> on Linux from the earlier threads:
> >
> > I wish we'd' a good way to have test scripts in the tree for something
> > like that, but using postgres. Unfortunately it's not easy to write
> > portable setup scripts for it.
>
> Yes, it seems to me as well that ENOSPC should be treated as much as
> EIO.  Just looking at the code for data_sync_retry we should really
> have some errno filtering.

I agree with you up to a point:  It would make some amount of sense
for data_sync_elevel() not to promote to PANIC if errno == ENOSYS;
then for sync_file_range() you'd get WARNING and for fsync() you'd get
ERROR (since that's what those call sites pass in) on hypothetical
kernels that lack those syscalls.  As I argued earlier, ENOSYS seems
to be the only errno that we know for sure to be non-destructive to
the page cache since it promises it didn't run any kernel code at all
(or rather didn't make it past the front door), so it's the *only*
errno that belongs on such a whitelist IMHO.  That would get us back
to where we were for WSL users in 11.1.

The question is whether we want to go further than that and provide a
better experience for WSL users, now that we know that it was already
spewing warnings.  One way to do that might be not to bother with
errno filtering at all, but instead (as Andres suggested) do a test of
whether sync_file_range() is implemented on this kernel/emulator at
startup and if not, just disable it somehow.  Then we don't need any
filtering.

Here is a restatement of my rationale for not including errno
filtering in the first place:  Take a look at the documented errnos
for fsync() on Linux.  Which of those tell us that it's sane to retry?
 AFAICS they all either mean that it ran filesystem code that is known
to toast your data on error (data loss has already occurred), or your
file descriptor is bogus (code somewhere is seriously busted and all
bets are off).

-- 
Thomas Munro
http://www.enterprisedb.com