On Thu, Jun 20, 2013 at 08:20:16AM -0400, Mathieu Desnoyers wrote: > * Rob van der Heij (rvdheij@xxxxxxxxx) wrote: > > Wouldn't you batch the calls to drop the pages from cache rather than drop > > one packet at a time? > > By default for kernel tracing, lttng's trace packets are 1MB, so I > consider the call to fadvise to be already batched by applying it to 1MB > packets rather than indivitual pages. Even there, it seems that the > extra overhead added by the lru drain on each CPU is noticeable. > > Another reason for not batching this in larger chunks is to limit the > impact of the tracer on the kernel page cache. LTTng limits itself to > its own set of buffers, and use the page cache for what is absolutely > needed to perform I/O, but no more. I think you are doing it wrong. This is a poster child case for using Direct IO and completely avoiding the page cache altogether.... > > Your effort to help Linux mm seems a bit overkill, > > Without performing this, I have a situation similar as yours, where > LTTng fills up the page cache very quickly, until it gets to a point > where memory pressure level increase enough that the consumerd is > blocked until some pages are reclaimed. I really don't care about making > the consumerd "as fast as possible for a while" if it means its > throughput will drop when the page cache is filled. I prefer a constant > slower pace to a short burst followed by slower throughput. > > > and you don't want every application to do it like that himself. > > Indeed, tracing has always been slightly odd in the sense that it's not > the workload the system is meant to run, but rather a tool that should > have the smallest impact on the usual system's run when it is used. > > > The > > fadvise will not even work when the page is still to be flushed out. > > Without the patch that started the thread, it would 'at random' not work > > due to SMP race condition (not multi-threading). > > This is why the lttng consumerd calls: > > sync_file_range with flags: > SYNC_FILE_RANGE_WAIT_BEFORE > SYNC_FILE_RANGE_WRITE > SYNC_FILE_RANGE_WAIT_AFTER > > on the page range. The purpose of this call is to flush the pages to > disk before calling fadvise(POSIX_FADV_DONTNEED) on the page range. Yup, you're emulating direct IO semantics with buffered IO. This seems to be an emerging trend I'm seeing a lot of over the past few months - I'm hearing about it because of all the wierd corner case behaviours it causes because sync_file_range() doesn't provide data integrity guarantees and fadvise(DONTNEED) can randomly issue lots of IO, block for long periods of time, silently do nothing, remove pages from the page cache and/or some or all of the above. Direct IO is a model of sanity compared to that mess.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html