On Tue 19-05-15 13:33:31, David Lang wrote: > On Tue, 19 May 2015, Daniel Phillips wrote: > > >>I understand that Tux3 may avoid these issues due to some other mechanisms > >>it internally has but if page forking should get into mm subsystem, the > >>above must work. > > > >It does work, and by example, it does not need a lot of code to make > >it work, but the changes are not trivial. Tux3's delta writeback model > >will not suit everyone, so you can't just lift our code and add it to > >Ext4. Using it in Ext4 would require a per-inode writeback model, which > >looks practical to me but far from a weekend project. Maybe something > >to consider for Ext5. > > > >It is the job of new designs like Tux3 to chase after that final drop > >of performance, not our trusty Ext4 workhorse. Though stranger things > >have happened - as I recall, Ext4 had O(n) directory operations at one > >time. Fixing that was not easy, but we did it because we had to. Fixing > >Ext4's write performance is not urgent by comparison, and the barrier > >is high, you would want jbd3 for one thing. > > > >I think the meta-question you are asking is, where is the second user > >for this new CoW functionality? With a possible implication that if > >there is no second user then Tux3 cannot be merged. Is that is the > >question? > > I don't think they are asking for a second user. What they are > saying is that for this functionality to be accepted in the mm > subsystem, these problem cases need to work reliably, not just work > for Tux3 because of your implementation. > > So for things that you don't use, you need to make it an error if > they get used on a page that's been forked (or not be an error and > 'do the right thing') > > For cases where it doesn't matter because Tux3 controls the > writeback, and it's undefined in general what happens if writeback > is triggered twice on the same page, you will need to figure out how > to either prevent the second writeback from triggering if there's > one in process, or define how the two writebacks are going to happen > so that you can't end up with them re-ordered by some other > filesystem. > > I think that that's what's meant by the top statement that I left in > the quote. Even if your implementation details make it safe, these > need to be safe even without your implementation details to be > acceptable in the core kernel. Yeah, that's what I meant. If you create a function which manipulates page cache, you better make it work with other functions manipulating page cache. Otherwise it's a landmine waiting to be tripped by some unsuspecting developer. Sure you can document all the conditions under which the function is safe to use but a function that has several paragraphs in front of it explaning when it is safe to use isn't very good API... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html