On Wed, Jun 08, 2016 at 11:58:12AM -0400, Johannes Weiner wrote: > On Wed, Jun 08, 2016 at 09:06:32AM +0900, Minchan Kim wrote: > > On Tue, Jun 07, 2016 at 10:18:18AM -0400, Johannes Weiner wrote: > > > On Tue, Jun 07, 2016 at 09:25:50AM +0900, Minchan Kim wrote: > > > > On Mon, Jun 06, 2016 at 03:48:27PM -0400, Johannes Weiner wrote: > > > > > --- a/Documentation/sysctl/vm.txt > > > > > +++ b/Documentation/sysctl/vm.txt > > > > > @@ -771,14 +771,20 @@ with no ill effects: errors and warnings on these stats are suppressed.) > > > > > > > > > > swappiness > > > > > > > > > > -This control is used to define how aggressive the kernel will swap > > > > > -memory pages. Higher values will increase agressiveness, lower values > > > > > -decrease the amount of swap. A value of 0 instructs the kernel not to > > > > > -initiate swap until the amount of free and file-backed pages is less > > > > > -than the high water mark in a zone. > > > > > +This control is used to define the relative IO cost of cache misses > > > > > +between the swap device and the filesystem as a value between 0 and > > > > > +200. At 100, the VM assumes equal IO cost and will thus apply memory > > > > > +pressure to the page cache and swap-backed pages equally. At 0, the > > > > > +kernel will not initiate swap until the amount of free and file-backed > > > > > +pages is less than the high watermark in a zone. > > > > > > > > Generally, I agree extending swappiness value good but not sure 200 is > > > > enough to represent speed gap between file and swap sotrage in every > > > > cases. - Just nitpick. > > > > > > How so? You can't give swap more weight than 100%. 200 is the maximum > > > possible value. > > > > In old, swappiness is how agressively reclaim anonymous pages in favour > > of page cache. But when I read your description and changes about > > swappiness in vm.txt, esp, *relative IO cost*, I feel you change swappiness > > define to represent relative IO cost between swap storage and file storage. > > Then, with that, we could balance anonymous and file LRU with the weight. > > > > For example, let's assume that in-memory swap storage is 10x times faster > > than slow thumb drive. In that case, IO cost of 5 anonymous pages > > swapping-in/out is equal to 1 file-backed page-discard/read. > > > > I thought it does make sense because that measuring the speed gab between > > those storages is easier than selecting vague swappiness tendency. > > > > In terms of such approach, I thought 200 is not enough to show the gab > > because the gap is started from 100. > > Isn't it your intention? If so, to me, the description was rather > > misleading. :( > > The way swappiness works never actually changed. > > The only thing that changed is that we used to look at referenced > pages (recent_rotated) and *assumed* they would likely cause IO when > reclaimed, whereas with my patches we actually know whether they are. > But swappiness has always been about relative IO cost of the LRUs. > > Swappiness defines relative IO cost between file and swap on a scale > from 0 to 200, where 100 is the point of equality. The scale factors > are calculated in get_scan_count() like this: > > anon_prio = swappiness > file_prio = 200 - swappiness > > and those are applied to the recorded cost/value ratios like this: > > ap = anon_prio * scanned / rotated > fp = file_prio * scanned / rotated > > That means if your swap device is 10 times faster than your filesystem > device, and you thus want anon to receive 10x the refaults when the > anon and file pages are used equally, you do this: > > x + 10x = 200 > x = 18 (ish) > > So your file priority is ~18 and your swap priority is the remainder > of the range, 200 - 18. You set swappiness to 182. > > Now fill in the numbers while assuming all pages on both lists have > been referenced before and will likely refault (or in the new model, > all pages are refaulting): > > fraction[anon] = ap = 182 * 1 / 1 = 182 > fraction[file] = fp = 18 * 1 / 1 = 18 > denominator = ap + fp = 182 + 18 = 200 > > and then calculate the scan target like this: > > scan[type] = (lru_size() >> priority) * fraction[type] / denominator > > This will scan and reclaim 9% of the file pages and 90% of the anon > pages. On refault, 9% of the IO will be from the filesystem and 90% > from the swap device. Thanks for the detail example. Then, let's change the example a little bit. A system has big HDD storage and SSD swap. HDD: 200 IOPS SSD: 100000 IOPS >From https://en.wikipedia.org/wiki/IOPS So, speed gap is 500x. x + 500x = 200 If we use PCIe-SSD, the gap will be larger. That's why I said 200 is enough to represent speed gap. Such system configuration is already non-sense so it is okay to ignore such usecases? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>