On Fri, Dec 3, 2021 at 12:28 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote: > > There is no chance for zram reading/writing to be counted in > > PSI_IO_WAIT so far as zram will deal with the request just in current > > context without invoking submit_bio and io_schedule. > > Hm, but you're also not waiting for a real io device - during which > the CPU could be doing something else e.g. You're waiting for > decompression. The thread also isn't in D-state during that time. What > scenario would benefit from this accounting? How is IO pressure from > comp/decomp paths actionable to you? No. Block device related D-state will be counted in via psi_dequeue(io_wait). What I am proposing here is do NOT ignore the influence on non-productive time by huge numbers of in-context swap in/out (zram like). This can help to make IO pressure more accurate and coordinate with the number of PSWPIN/OUT. It is like counting the IO time within filemap_fault->wait_on_page_bit_common into psi_mem_stall, which introduces memory pressure high by IO. > > What about when you use zram with disk writeback enabled, and you see > a mix of decompression and actual disk IO. Wouldn't you want to be > able to tell the two apart, to see if you're short on CPU or short on > IO bandwidth in this setup? Your patch would make that impossible. OK. Is it better to start the IO counting from pageout? Both of the bdev and ram backed swap would benefit from it. > > This needs a much more comprehensive changelog. > > > > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, > > > zram_get_element(zram, index), > > > bio, partial_io); > > > } > > > - > > > +#ifdef CONFIG_PSI > > > + psi_task_change(current, 0, TSK_IOWAIT); > > > +#endif > > Add psi_iostall_enter() and leave helpers that encapsulate the ifdefs. OK.