On Tue, Apr 06, 2021 at 03:29:18PM +0300, Kirill A. Shutemov wrote: > On Wed, Mar 31, 2021 at 07:47:02PM +0100, Matthew Wilcox (Oracle) wrote: > > +/** > > + * folio_next - Move to the next physical folio. > > + * @folio: The folio we're currently operating on. > > + * > > + * If you have physically contiguous memory which may span more than > > + * one folio (eg a &struct bio_vec), use this function to move from one > > + * folio to the next. Do not use it if the memory is only virtually > > + * contiguous as the folios are almost certainly not adjacent to each > > + * other. This is the folio equivalent to writing ``page++``. > > + * > > + * Context: We assume that the folios are refcounted and/or locked at a > > + * higher level and do not adjust the reference counts. > > + * Return: The next struct folio. > > + */ > > +static inline struct folio *folio_next(struct folio *folio) > > +{ > > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > > + return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio)); > > +#else > > + return folio + folio_nr_pages(folio); > > +#endif > > Do we really need the #if here? > > >From quick look at nth_page() and memory_model.h, compiler should be able > to simplify calculation for FLATMEM or SPARSEMEM_VMEMMAP to what you do in > the #else. No? No. 0000000000001180 <a>: struct page *a(struct page *p, unsigned long n) { 1180: e8 00 00 00 00 callq 1185 <a+0x5> 1181: R_X86_64_PLT32 __fentry__-0x4 1185: 55 push %rbp return nth_page(p, n); 1186: 48 2b 3d 00 00 00 00 sub 0x0(%rip),%rdi 1189: R_X86_64_PC32 vmemmap_base-0x4 118d: 48 c1 ff 06 sar $0x6,%rdi 1191: 48 8d 04 37 lea (%rdi,%rsi,1),%rax 1195: 48 89 e5 mov %rsp,%rbp return nth_page(p, n); 1198: 48 c1 e0 06 shl $0x6,%rax 119c: 48 03 05 00 00 00 00 add 0x0(%rip),%rax 119f: R_X86_64_PC32 vmemmap_base-0x4 11a3: 5d pop %rbp 11a4: c3 retq vs 00000000000011b0 <b>: struct page *b(struct page *p, unsigned long n) { 11b0: e8 00 00 00 00 callq 11b5 <b+0x5> 11b1: R_X86_64_PLT32 __fentry__-0x4 11b5: 55 push %rbp return p + n; 11b6: 48 c1 e6 06 shl $0x6,%rsi 11ba: 48 8d 04 37 lea (%rdi,%rsi,1),%rax 11be: 48 89 e5 mov %rsp,%rbp 11c1: 5d pop %rbp 11c2: c3 retq Now, maybe we should put this optimisation into the definition of nth_page? > > +struct folio { > > + /* private: don't document the anon union */ > > + union { > > + struct { > > + /* public: */ > > + unsigned long flags; > > + struct list_head lru; > > + struct address_space *mapping; > > + pgoff_t index; > > + unsigned long private; > > + atomic_t _mapcount; > > + atomic_t _refcount; > > +#ifdef CONFIG_MEMCG > > + unsigned long memcg_data; > > +#endif > > As Christoph, I'm not a fan of this :/ What would you prefer?