On 03/14/2012 07:10 PM, Andy Lutomirski wrote: > On 03/08/2012 08:43 AM, Sage Weil wrote: >> On Thu, 8 Mar 2012, Ted Ts'o wrote: >>> On Wed, Mar 07, 2012 at 10:27:43PM -0800, Sage Weil wrote: >>>> >>>> This avoids the problem for devices that don't need stable pages, but >>>> doesn't help for those that do (btrfs, raid, iscsi, dif/dix, etc.). It >>>> seems to me like a more elegant solution would be to COW the page in the >>>> address_space so that you get stable writeback pages without blocking. >>>> That's clearly more complex, and I'm sure there are a range of issues >>>> involved in making that work, but I would hope that it would be doable >>>> with generic MM infrastructure so that everyone would benefit. >>> >>> Well, even doing a COW (or anything that involves messing with page >>> tables) is not free. So even if we can make the cost of stable >>> writeback pages cheaper, if we can completely avoid the cost, this >>> would be good. I'd also rather fix the performance regression sooner >>> rather than later, and I suspect the COW solution is not something >>> that could be prepared in time for the upcoming merge window. >> >> Definitely. This patch looks like a fine approach for your situation. I >> just don't want the subject to come up without talking about a general >> solution. And it's very interesting to hear about a (simple) workload >> that is affected by the wait_on_page_writeback(). > > I'll add a simple workload. I have a soft real-time program that has > two threads. One of them fallocates some files, mmaps them, mlocks > them, and touches all the pages to prefault them. (This thread has no > real-time constraints -- it just needs to keep up.) The other thread > writes to the files. > > On Windows, this works very well. On Linux without stable pages, it > almost works. With stable pages, it's a complete disaster. No amount > of minimizing the amount of time that pages under writeback can cause > writers to sleep will help -- writers *must not wait for io* when > writing mlocked, prefaulted pages for my code to work. > Right, this is Windows shit. If your goal is to never wait, IO as fast as possible, and use the least amount of CPU time then it's exactly the opposite of what you want to do. You want to do async Direct IO. Also as Dave Chinner said "Double/ring buffering with async IO dispatch" BTW Even On windows there are much better ways to do this. Also there ring buffering with async direct IO. > (The other issue involves file_update_time. I'll send a fix eventually.) > > FWIW, it would be really nice if there was a way to lock a mapping so > hard that accesses are guaranteed to not even cause soft faults. We're > far from being able to do that now, though. > > --Andy Cheers Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html