On 12/12/19 2:45 PM, Martin Steigerwald wrote: > Jens Axboe - 12.12.19, 16:16:31 CET: >> On 12/12/19 3:44 AM, Martin Steigerwald wrote: >>> Jens Axboe - 11.12.19, 16:29:38 CET: >>>> Recently someone asked me how io_uring buffered IO compares to >>>> mmaped >>>> IO in terms of performance. So I ran some tests with buffered IO, >>>> and >>>> found the experience to be somewhat painful. The test case is >>>> pretty >>>> basic, random reads over a dataset that's 10x the size of RAM. >>>> Performance starts out fine, and then the page cache fills up and >>>> we >>>> hit a throughput cliff. CPU usage of the IO threads go up, and we >>>> have kswapd spending 100% of a core trying to keep up. Seeing >>>> that, I was reminded of the many complaints I here about buffered >>>> IO, and the fact that most of the folks complaining will >>>> ultimately bite the bullet and move to O_DIRECT to just get the >>>> kernel out of the way. >>>> >>>> But I don't think it needs to be like that. Switching to O_DIRECT >>>> isn't always easily doable. The buffers have different life times, >>>> size and alignment constraints, etc. On top of that, mixing >>>> buffered >>>> and O_DIRECT can be painful. >>>> >>>> Seems to me that we have an opportunity to provide something that >>>> sits somewhere in between buffered and O_DIRECT, and this is where >>>> RWF_UNCACHED enters the picture. If this flag is set on IO, we get >>>> the following behavior: >>>> >>>> - If the data is in cache, it remains in cache and the copy (in or >>>> out) is served to/from that. >>>> >>>> - If the data is NOT in cache, we add it while performing the IO. >>>> When the IO is done, we remove it again. >>>> >>>> With this, I can do 100% smooth buffered reads or writes without >>>> pushing the kernel to the state where kswapd is sweating bullets. >>>> In >>>> fact it doesn't even register. >>> >>> A question from a user or Linux Performance trainer perspective: >>> >>> How does this compare with posix_fadvise() with POSIX_FADV_DONTNEED >>> that for example the nocache¹ command is using? Excerpt from >>> manpage> >>> posix_fadvice(2): >>> POSIX_FADV_DONTNEED >>> >>> The specified data will not be accessed in the near >>> future. >>> >>> POSIX_FADV_DONTNEED attempts to free cached pages as‐ >>> sociated with the specified region. This is useful, >>> for example, while streaming large files. A program >>> may periodically request the kernel to free cached >>> data that has already been used, so that more useful >>> cached pages are not discarded instead. >>> >>> [1] packaged in Debian as nocache or available >>> herehttps://github.com/ Feh/nocache >>> >>> In any way, would be nice to have some option in rsync… I still did >>> not change my backup script to call rsync via nocache. >> >> I don't know the nocache tool, but I'm guessing it just does the >> writes (or reads) and then uses FADV_DONTNEED to drop behind those >> pages? That's fine for slower use cases, it won't work very well for >> fast IO. The write side currently works pretty much like that >> internally, whereas the read side doesn't use the page cache at all. > > Yes, it does that. And yeah I saw you changed the read site to bypass > the cache entirely. > > Also as I understand it this is for asynchronous using io uring > primarily? Or preadv2/pwritev2, they also allow passing in RWF_* flags. -- Jens Axboe