On Tue, May 30, 2023 at 9:20 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2023/05/30 8:58, Xiao Ni 写道: > > On Mon, May 29, 2023 at 4:50 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> 在 2023/05/29 15:57, Xiao Ni 写道: > >>> On Mon, May 29, 2023 at 11:18 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >>>> > >>>> Hi, > >>>> > >>>> 在 2023/05/29 11:10, Xiao Ni 写道: > >>>>> On Mon, May 29, 2023 at 10:20 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> 在 2023/05/29 10:08, Xiao Ni 写道: > >>>>>>> Hi Kuai > >>>>>>> > >>>>>>> There is a limitation of the memory in your test. But for most > >>>>>>> situations, customers should not set this. Can this change introduce a > >>>>>>> performance regression against other situations? > >>>>>> > >>>>>> Noted that this limitation is just to triggered writeback as soon as > >>>>>> possible in the test, and it's 100% sure real situations can trigger > >>>>>> dirty pages write back asynchronously and continue to produce new dirty > >>>>>> pages. > >>>>> > >>>>> Hi > >>>>> > >>>>> I'm confused here. If we want to trigger write back quickly, it needs > >>>>> to set these two values with a smaller number, rather than 0 and 60. > >>>>> Right? > >>>> > >>>> 60 is not required, I'll remove this setting. > >>>> > >>>> 0 just means write back if there are any dirty pages. > >>> > >>> Hi Kuai > >>> > >>> Does 0 mean disabling write back? I tried to find the doc that > >>> describes the meaning when setting dirty_background_ratio to 0, but I > >>> didn't find it. > >>> In https://www.kernel.org/doc/html/next/admin-guide/sysctl/vm.html it > >>> doesn't describe this. But it says something like this > >>> > >>> Note: > >>> dirty_background_bytes is the counterpart of dirty_background_ratio. Only > >>> one of them may be specified at a time. When one sysctl is written it is > >>> immediately taken into account to evaluate the dirty memory limits and the > >>> other appears as 0 when read. > >>> > >>> Maybe you can specify dirty_background_ratio to 1 if you want to > >>> trigger write back ASAP. > >> > >> The purpose here is to trigger write back ASAP, I'm not an expert here, > >> but based on test result, 0 obviously doesn't mean disable write back. > >> > >> Set dirty_background_bytes to a value, dirty_background_ratio will be > >> set to 0 together, which means dirty_background_ratio is disabled. > >> However, change dirty_background_ratio from default value to 0, will end > >> up both dirty_background_ratio and dirty_background_bytes to be 0, and > >> based on following related code, I think 0 just means write back if > >> there are any dirty pages. > >> > >> domain_dirty_limits: > >> bg_bytes = dirty_background_bytes -> 0 > >> bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100 -> 0 > >> > >> if (bg_bytes) > >> bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE); > >> else > >> bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; -> 0 > >> > >> dtc->bg_thresh = bg_thresh; -> 0 > >> > >> balance_dirty_pages > >> nr_reclaimable = global_node_page_state(NR_FILE_DIRTY); > >> if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh && > >> !writeback_in_progress(wb)) > >> wb_start_background_writeback(wb); -> writeback ASAP > >> > >> Thanks, > >> Kuai > > > > Hi Kuai > > > > I'm not an expert about this either. Thanks for all your patches, I > > can study more things too. But I still have some questions. > > > > I did a test in my environment something like this: > > modprobe brd rd_nr=4 rd_size=10485760 > > mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean > > echo 0 > /proc/sys/vm/dirty_background_ratio > > fio -filename=/dev/md0 -ioengine=libaio -rw=write -thread -bs=1k-8k > > -numjobs=1 -iodepth=128 --runtime=10 -name=xxx > > It will cause OOM and the system hangs > > OOM means you trigger this problem... Plug hold lots of bios and cost > lots of memory, it's not that write back is disabled, you can verify > this by monitor md inflight, noted that don't use too much memory for > ramdisk(rd_nr * rd_size) in the test so that OOM won't be triggered. > > Have you tried to test with this patchset? Yes, I know I have reproduced this problem. I'll have the v3 patchest. > > > > > modprobe brd rd_nr=4 rd_size=10485760 > > mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean > > echo 1 > /proc/sys/vm/dirty_background_ratio (THIS is the only different place) > > fio -filename=/dev/md0 -ioengine=libaio -rw=write -thread -bs=1k-8k > > -numjobs=1 -iodepth=128 --runtime=10 -name=xxx > > It can finish successfully. The value of dirty_background_ration is 1 > > here means it flushes ASAP > > This really doesn't mean flushes ASAP, our test report this problem in > the real test that doesn't modify dirty_background_ratio. I guess > somewhere triggers io_scheduler(), probably background thread think > dirty pages doesn't match threshold, but I'm not sure for now. Thanks for notifying me of this. Regards Xiao > > Thanks, > Kuai > > > > So your method should be the opposite way as you designed. All the > > memory can't be flushed in time, so it uses all memory very soon and > > the memory runs out and the system hangs. The reason I'm looking at > > the test is that do we really need this change. Because in the real > > world, most customers don't disable write back. Anyway, it depends on > > Song's decision and thanks for your patches again. I'll review V3 and > > try to do some performance tests. > > > > Best Regards > > Xiao >