Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote:
> On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote:
> > Or maybe you just want the syscall to return an error instead of
> > blocking for an unbounded amount of time if userspace asks for
> > something silly.
> 
> Warn on allocation above a certain size without MAY_FAIL would seem to
> cover all those cases.  If there is a case for requiring instant
> allocation, you always have GFP_ATOMIC, and, I suppose, we could even
> do a bounded reclaim allocation where it tries for a certain time then
> fails.

Then you're baking in this weird constant into all your algorithms that
doesn't scale as machine memory sizes and working set sizes increase.

> > Honestly, relying on the OOM killer and saying that because that now
> > we don't have to write and test your error paths is a lazy cop out.
> 
> OOM Killer is the most extreme outcome.  Usually reclaim (hugely
> simplified) dumps clean cache first and tries the shrinkers then tries
> to write out dirty cache.  Only after that hasn't found anything after
> a few iterations will the oom killer get activated

All your caches dumped and the machine grinds to a halt and then a
random process gets killed instead of simply _failing the allocation_.

> > The same kind of thinking got us overcommit, where yes we got an
> > increase in efficiency, but the cost was that everyone started
> > assuming and relying on overcommit, so now it's impossible to run
> > without overcommit enabled except in highly controlled environments.
> 
> That might be true for your use case, but it certainly isn't true for a
> cheap hosting cloud using containers: overcommit is where you make your
> money, so it's absolutely standard operating procedure.  I wouldn't
> call cheap hosting a "highly controlled environment" they're just
> making a bet they won't get caught out too often.

Reading comprehension fail. Reread what I wrote.

> > And that means allocation failure as an effective signal is just
> > completely busted in userspace. If you want to write code in
> > userspace that uses as much memory as is available and no more, you
> > _can't_, because system behaviour goes to shit if you have overcommit
> > enabled or a bunch of memory gets wasted if overcommit is disabled
> > because everyone assumes that's just what you do.
> 
> OK, this seems to be specific to your use case again, because if you
> look at what the major user space processes like web browsers do, they
> allocate way over the physical memory available to them for cache and
> assume the kernel will take care of it.  Making failure a signal for
> being over the working set would cause all these applications to
> segfault almost immediately.

Again, reread what I wrote. You're restating what I wrote and completely
missing the point.

> > Let's _not_ go that route in the kernel. I have pointy sticks to
> > brandish at people who don't want to deal with properly handling
> > errors.
> 
> Error legs are the least exercised and most bug, and therefore exploit,
> prone pieces of code in C.  If we can get rid of them, we should.

Fuck no.

Having working error paths is _basic_, and learning how to test your
code is also basic. If you can't be bothered to do that you shouldn't be
writing kernel code.

We are giving far too much by going down the route of "oh, just kill
stuff if we screwed the pooch and overcommitted".

I don't fucking care if it's what the big cloud providers want because
it's convenient for them, some of us actually do care about reliability.

By just saying "oh, the OO killer will save us" what you're doing is
making it nearly impossible to fully utilize a machine without having
stuff randomly killed.

Fuck. That.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux