Re: generic/347 data=journal regression in 4.17

"Theodore Y. Ts'o" <tytso@xxxxxxx> · Tue, 3 Jul 2018 20:20:19 -0400

(Added Matthew Wilcox to the CC list)

On Fri, Jun 15, 2018 at 11:45:46AM -0400, Eric Whitney wrote:
> As discussed in this week's ext4 concall, I'm seeing a regression for
> generic/347 in the data=journal test case when running xfstests-bld on
> 4.17.  The test fails with the following output:
> 
>     QA output created by 347
>     +/sbin/e2fsck: Unknown code ____ 251 while recovering journal of /dev/mapper/thin-vol
> 
> The failure bisects to:
> 
> errseq: Always report a writeback error once (b4678df184b3)
> 
> This patch landed in v4.17-rc4.  When it's reverted in v4.17, it's no longer
> possible to reproduce the error.

The bogus error was casued by a bug in e2fsck/journal.c.  Once I fixed
sync_blockdev() to return the error using the kernel error return
conventions (we convert it back to an errcode_t higher up in the call
stack), we get the expected error: 

/sbin/e2fsck: Input/output error while recovering journal of /dev/mapper/thin-vol

The problem is I'm not sure how to fix this in the kernel.  In order
to keep the postgress folks, we can't just clear wb_err when the
dm_thin volume is grown, since there were previous errors.  Unlike the
loop device case, the dm_thin device isn't getting reset in the test
generic/347.  More space is added to the dm_thin device so that future
writes won't fail, but the previous writes definitely will fail, so
there's no obvious fix to dm-thin that won't break Postgres's
"interesting" error collection strategy.

So I think all we can do is have e2fsck deliberately throw away the
error by calling fsync on the block device.  The unfortunate thing
about this is that this forces a SYNCHRONIZE CACHE command to be sent
to the storage device, which is otherwise unnecessary.  I'll do this
for now (see below), since there appears to be no other alternative,
but I wonder if the right answer is we should add a new fnctl() which
allows the caller to collect the error and clear wb_err without
needing to send CACHE FLUSH to the device.

Willy, what do you think?

					- Ted

--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -569,6 +569,14 @@ static errcode_t unix_open_channel(const char *name, int fd,
 	if (safe_getenv("UNIX_IO_FORCE_BOUNCE"))
 		flags |= IO_FLAG_FORCE_BOUNCE;
 
+#ifdef __linux__
+	/*
+	 * We need to make sure any previous errors in the block
+	 * device are thrown away, sigh.  Scat, bogus error!  Scat!!
+	 */
+	(void) fsync(fd);
+#endif
+
 	retval = ext2fs_get_mem(sizeof(struct struct_io_channel), &io);
 	if (retval)
 		goto cleanup;