On Thu, Feb 25, 2016 at 11:24:57AM -0500, Jeff Moyer wrote: > Hi, Dave, > > Dave Chinner <david@xxxxxxxxxxxxx> writes: > > > Well, let me clarify what I said a bit here, because I feel like I'm > > being unfairly blamed for putting data integrity as the highest > > priority for DAX+pmem instead of falling in line and chanting > > "Performance! Performance! Performance!" with everyone else. > > It's totally fair. ;-) > > > Let me state this clearly: I'm not opposed to making optimisations > > that change the way applications and the kernel interact. I like the > > idea of MAP_SYNC, but I see this sort of API/behaviour change as a > > last resort when all else fails, not a "first and only" optimisation > > option. > > So, calling it "first and only" seems a bit unfair on your part. Maybe so, but it's a valid observation - it's being pushed as a way of avoidning needing to make the kernel code work correctly and fast. i.e. the argument is "new, unoptimised code is too slow, so we want a knob to avoid it completely". Boaz keeps saying that we can make the kernel code faster, but he's still pushing to enable bypassing that code rather than sending patches to make the kernel pmem infrastructure faster. Such bypasses lead to the situation that the kernel code isn't used by the applications that could benefit from optimisation and improvement of the kernel code because they don't use it anymore. This is what I meant as "first and only" kernel optimisation. > I > don't think anyone asking for a MAP_SYNC option doesn't also want other > applications to work well. That aside, this is where your opinion > differs from mine: I don't see MAP_SYNC as a last resort option. And > let me be clear, this /is/ an opinion. I have no hard facts to back it > up, precisely because we don't have any application we can use for a > comparison. Right, we have no numbers, and we don't yet have an optimised kernel side implementation to compare against. Until we have the ability to compare apples with apples, we should be pushing back against API changes that are based on oranges being tastier than apples. > But, it seems plausible to me that no matter how well you > optimize your msync implementation, it will still be more expensive than > an application that doesn't call msync at all. This obviously depends > on how the application is using the programming model, among other > things. I agree that we would need real data to back this up. However, > I don't see any reason to preclude such an implementation, or to leave > it as a last resort. I think it should be part of our planning process > if it's reasonably feasible. Essentially I see this situation/request as conceptually the same as O_DIRECT for read/write - O_DIRECT bypasses the kernel dirty range tracking and, as such, has nasty cache coherency issues when you mix it with buffered IO. Nor does it play well with mmap, it has different semantics for every filesystem and the kernel code has been optimised to the point of fragility. And, of course, O_DIRECT requires applications to do exactly the right things to extract performance gains and maintain data integrity. If they get it right, they will be faster than using the page cache, but we know that applications often get it very wrong. And even when they get it right, data corruption can still occur because some thrid party accessed file in a different manner (e.g. a backup) and triggered one of the known, fundamentally unfixable coherency problems. However, despite the fact we are stuck with O_DIRECT and it's deranged monkeys (which I am one of), we should not be ignoring the problems that bypassing the kernel infrastructure has caused us and continues to cause us. As such, we really need to think hard about whether we should be repeating the development of such a bypass feature. If we do, we stand a very good chance of ending up in the same place - a bunch of code that does not play well with others, and a nightmare to test because it's expected to work and not corrupt data... We should try very hard not to repeat the biggest mistake O_DIRECT made: we need to define and document exactly what behaviour we guarantee, how it works and exaclty what responsisbilities the kernel and userspace have in *great detail* /before/ we add the mechanism to the kernel. Think it through carefully - API changes and semantics are forever. We don't want to add something that in a couple of years we are wishing we never added.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>