Re: [RFC] Tux3 for review

Daniel Phillips <daniel@xxxxxxxxx> · Thu, 19 Jun 2014 14:58:20 -0700

On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
On Thu, 19 Jun 2014, Pavel Machek wrote:

Date: Thu, 19 Jun 2014 10:21:29 +0200
From: Pavel Machek <pavel@xxxxxx>
To: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
Cc: Daniel Phillips <daniel@xxxxxxxxx>, Dave Chinner 
<david@xxxxxxxxxxxxx>,
linux-kernel@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx,
 ...

Flamed ? really ?

Yes, really. There were valid points and there were also unabashed flames. 
The latter are not helpful to anybody, even the flamer. But note that there 
were no counter flames. The boy scout rule applies: always leave your 
campsite cleaner than you found it.

Dave pointed out some serious coding style problems.
Those should be very easy to fix.

One needs to be careful about the definition of "fix" so that it does not 
turn into "throw the baby out with the bath water". Our kernel code 
necessarily has a few __KERNEL__ #ifdefs because the majority of it also 
runs in user space. This not a feature to disparage, far from it.

Among other benefits, running in user space supports automated unit testing 
at fine granularity. We run make tests as a habit to catch a wide spectrum 
of correctness regressions. A successful make tests usually indicates that 
the heavyweight kernel stress tests are going to pass. Obviously, there are 
occasional exceptions to this. For example user space does not catch SMP 
races. In practice, only a handful of those have slipped through and 
required kernel level bug chasing.

That said, we will will happily merge any concrete suggestion that reduces 
the frequency of __KERNEL__. But please be realistic. There are 32 
__KERNEL__ ifdefs in our 18K line code base. That hardly amounts to a 
"dog's breakfast".

Let me remind you some more important problems Dave brought up,
including page forking:

"
 The hacks around VFS and MM functionality need to have demonstrated
 methods for being removed.

We already removed 450 lines of core kernel workarounds from Tux3 with an 
approach that was literally cut and pasted from one of Dave's emails. Then 
Dave changed his mind. Now the Tux3 team has been assigned a research 
project to improve core kernel writeback instead of simply adapting the 
approach that is already proven to work well enough. That is a rather 
blatant example of "perfect is the enemy of good enough". Please read the 
thread.

We're not going to merge that page
 forking stuff (like you were told at LSF 2013 more than a year ago:
 http://lwn.net/Articles/548091/) without rigorous design review and
 a demonstration of the solutions to all the hard corner cases it
 has. The current code doesn't solve them (e.g. direct IO doesn't
 work in tux3), and there's no clear patch set we can review that
 demonstrates how it is all supposed to work. i.e. you need to
 separate out all the page forking code into a separate patchset for
 review, independent of the tux3 code and applies to the core mm/
 code.
"

Direct IO is a spurious issue. To recap: direct IO does not introduce any 
new page forking issues. All of the page forking issues already exist with 
normal buffered IO and mmap. We have little interest and scant available 
time for heading off on a tangent to implement direct IO at this point just 
as a precondition for merging.

On the other hand, page forking itself has a number of interesting issues. 
Hirofumi is currently preparing a set of core kernel patches for review. 
These patches explicitly do not attempt to package page forking up into a 
nice and easy API that other filesystems could patch in tomorrow. That 
would be an unreasonable research burden on our small development team. 
Instead, we show how it works in Tux3, and if other filesystems want to get 
those benefits, they can make similar changes. If we (the kernel community) 
are lucky enough to find a pattern in it such that substantial parts of the 
code can be abstracted into a library, then good. But requiring such a 
library to be developed as a precondition to merging Tux3 is unreasonable.

"
 Then there's all the writeback hacks. You've simply copy-n-pasted
 most of fs-writeback.c, including duplicating structures like struct
 wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
 able to access core structures from kernel module context
 (tux3_setup_writeback(), I'm looking at you). This is completely
 unacceptible for a merge. Again, you need to separate out all the
 writeback changes you need into an independent patchset so that they
 can be reviewed independently of the tux3 code that uses it.
"

That was already fixed as noted above, and all the relevant changes were 
already posted as an independent patch set. After that, some developers 
weighed in with half formed ideas about how the same thing could be done 
better, but without concrete suggestions. There is nothing wrong with half 
formed ideas, except when they turn into a way of blocking forward 
progress. See "perfect is the enemy of good enough" above.

It is worth noting that we (the kernel community) have been thrashing away 
at the writeback problem for more than twenty years, and the current 
solution still leaves much to be desired. It is unfair to expect us, the 
Tux3 team, to fix that mess in a week or two, just to merge our filesystem. 
We prefer to adapt the existing infrastructure for now, as expressed in the 
currently proposed patch set. With that, we allow core to mark our inodes 
dirty just as it has always done, and we continue to use the usual inode 
writeback lists for writeback sheduling, which work just fine.

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html