On 12/11/19 6:29 PM, Jens Axboe wrote: > On 12/11/19 6:22 PM, Linus Torvalds wrote: >> On Wed, Dec 11, 2019 at 5:11 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>> >>> 15K is likely too slow to really show an issue, I'm afraid. The 970 >>> is no slouch, but your crypt setup will likely hamper it a lot. You >>> don't have a non-encrypted partition on it? >> >> No. I normally don't need all that much disk, so I've never upgraded >> my ssd from the 512G size. >> >> Which means that it's actually half full or so, and I never felt like >> "I should keep an unencrypted partition for IO testing", since I don't >> generally _do_ any IO testing. >> >> I can get my load up with "numjobs=8" and get my iops up to the 100k >> range, though. >> >> But kswapd doesn't much seem to care, the CPU percentage actually does >> _down_ to 0.39% when I try that. Probably simply because now my CPU's >> are busy, so they are running at 4.7Ghz instead of the 800Mhz "mostly >> idle" state ... >> >> I guess I should be happy. It does mean that the situation you see >> isn't exactly the normal case. I understand why you want to do the >> non-cached case, but the case I think it the worrisome one is the >> regular buffered one, so that's what I'm testing (not even trying the >> noaccess patches). >> >> So from your report I went "uhhuh, that sounds like a bug". And it >> appears that it largely isn't - you're seeing it because of pushing >> the IO subsystem by another order of magnitude (and then I agree that >> "under those kinds of IO loads, caching just won't help") > > I'd very much argue that it IS a bug, maybe just doesn't show on your > system. My test box is a pretty standard 2 socket system, 24 cores / 48 > threads, 2 nodes. The last numbers I sent were 100K IOPS, so nothing > crazy, and granted that's only 10% kswapd cpu time, but that still seems > very high for those kinds of rates. I'm surprised you see essentially no > kswapd time for the same data rate. > > We'll keep poking here, I know Johannes is spending some time looking > into the reclaim side. Out of curiosity, just tried it on my laptop, which also has some samsung drive. Using 8 jobs, I get around 100K IOPS too, and this is my top listing: 23308 axboe 20 0 623156 1304 8 D 10.3 0.0 0:03.81 fio 23309 axboe 20 0 623160 1304 8 D 10.3 0.0 0:03.81 fio 23311 axboe 20 0 623168 1304 8 D 10.3 0.0 0:03.82 fio 23313 axboe 20 0 623176 1304 8 D 10.3 0.0 0:03.82 fio 23314 axboe 20 0 623180 1304 8 D 10.3 0.0 0:03.81 fio 162 root 20 0 0 0 0 S 9.9 0.0 0:12.97 kswapd0 23307 axboe 20 0 623152 1304 8 D 9.9 0.0 0:03.84 fio 23310 axboe 20 0 623164 1304 8 D 9.9 0.0 0:03.81 fio 23312 axboe 20 0 623172 1304 8 D 9.9 0.0 0:03.80 fio kswapd is between 9-11% the whole time, and the profile looks very similar to what I saw on my test box: 35.79% kswapd0 [kernel.vmlinux] [k] xas_create 9.97% kswapd0 [kernel.vmlinux] [k] free_pcppages_bulk 9.94% kswapd0 [kernel.vmlinux] [k] isolate_lru_pages 7.78% kswapd0 [kernel.vmlinux] [k] shrink_page_list 3.78% kswapd0 [kernel.vmlinux] [k] xas_clear_mark 3.08% kswapd0 [kernel.vmlinux] [k] workingset_eviction 2.48% kswapd0 [kernel.vmlinux] [k] __isolate_lru_page 2.06% kswapd0 [kernel.vmlinux] [k] page_mapping 1.95% kswapd0 [kernel.vmlinux] [k] __remove_mapping So now I'm even more puzzled why your (desktop?) doesn't show it, it must be more potent than my x1 laptop. But for me, the laptop and 2 socket test box show EXACTLY the same behavior, laptop is just too slow to make it really pathological. -- Jens Axboe