Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 5/31/24 02:19, Byungchul Park wrote: > .. > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 0283cf366c2a..03683bf66031 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -2872,6 +2872,12 @@ static inline void file_end_write(struct file *file) > > if (!S_ISREG(file_inode(file)->i_mode)) > > return; > > sb_end_write(file_inode(file)->i_sb); > > + > > + /* > > + * XXX: If needed, can be optimized by avoiding luf_flush() if > > + * the address space of the file has never been involved by luf. > > + */ > > + luf_flush(); > > } > .. > > +void luf_flush(void) > > +{ > > + unsigned long flags; > > + unsigned short int ugen; > > + > > + /* > > + * Obtain the latest ugen number. > > + */ > > + spin_lock_irqsave(&luf_lock, flags); > > + ugen = luf_gen; > > + spin_unlock_irqrestore(&luf_lock, flags); > > + > > + check_luf_flush(ugen); > > +} > > Am I reading this right? There's now an unconditional global spinlock It looked *too much* to split the lock to several locks as rcu does until version 11. However, this code introduced in v11 looks problematic. > acquired in the sys_write() path? How can this possibly scale? I should find a better way. > So, yeah, I think an optimization is absolutely needed. But, on a more > fundamental level, I just don't believe these patches are being tested. > Even a simple microbenchmark should show a pretty nasty regression on > any decently large system: > > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/write1.c > > Second, I was just pointing out sys_write() as an example of how the > page cache could change. Couldn't a separate, read/write mmap() of the > file do the same thing and *not* go through sb_end_write()? > > So: > > fd = open("foo"); > ptr1 = mmap(fd, PROT_READ); > ptr2 = mmap(fd, PROT_READ|PROT_WRITE); > > foo = *ptr1; // populate the page cache > ... page cache page is reclaimed and LUF'd > *ptr2 = bar; // new page cache page is allocated and written to I think this part would work but I'm not convinced. I will check again. > printk("*ptr1: %d\n", *ptr1); > > Doesn't the printk() see stale data? > > I think tglx would call all of this "tinkering". The approach to this > series is to "fix" narrow, specific cases that reviewers point out, make > it compile, then send it out again, hoping someone will apply it. Sorry for not perfect work and bothering you but you know what? I can see what is happening in this community too. Of course, I bet you would post better quality mm patches from the 1st version than me but might not in other subsystems. > So, for me, until the approach to this series changes: NAK, for x86. I understand why you got mad and feel sorry but I couldn't expect the regression you mentioned above. And I admit the patches have had problems I couldn't find in advance until you, Hildenbrand and Ying. I will do better. > Andrew, please don't take this series. Or, if you do, please drop the > patch enabling it on x86. I don't want to ask to merge either, if there are still issues. > I also have the feeling our VFS friends won't take kindly to having That is also what I thought it was. What should I do then? I don't believe you do not agree with the concept itself. Thing is the current version is not good enough. I will do my best by doing what I can do. > random luf_foo() hooks in their hot paths, optimized or not. I don't > see any of them on cc. Yes. I should've cc'd them. I will. Byungchul