On (04/26/18 11:42), Petr Mladek wrote: [..] > > Believe me, I could perfectly understand the desire to create perfect > defensive solutions that would never break anything. It is not easy > to decide when the best-effort solutions are worth the risk. Yes, but my point is - I don't think we clearly understand the root cause of the problem. See below [you can jump over the next section]. [..] > Honestly, I do not believe that console drivers are like Scheherazade. > They are not able to make up long interesting stories. Let's say that > lockdep splat has more than 100 lines but it can happen only once. > Let's say that WARNs have about 40 lines. I somehow doubt that we > could ever see 10 different WARN calls from one con->write() call. The problem here is that it takes a human being with IQ to tell what's repetitive, what's useless and what's not. vprintk(...) { if (!__ratelimit()) return; } has zero IQ to make such decisions. Sorry, the numbers don't work for me. "Console drivers added 100 lines in 1 hour" does not tell me that we had "an infinite console_unlock() loop". Dunno. Quite likely I'm wrong. Wouldn't be the first time ever. But it's unclear to me why we are out of options, without even looking at logs, but... May be. > > But we first need a real reason. Right now it looks to me like > > we have "a solution" to a problem which we have never witnessed. > > I am trying to find a "simple" and generic solution for the problem > reported by Tejun: [..] > 1. Console is IPMI emulated serial console. Super slow. Also > netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack > / driver tries to allocate memory and then fail, which in turn > triggers allocation failure or other warning messages. printk was > already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never > shrinking. Because OOM handler is trapped in printk flushing, it > never manages to free memory and no one else can enter OOM path > either, so the system is trapped in this state. > </paste> Yes, and that's why I want to take a look at the logs/backtraces. I had a very-very quick look at netconsole code, and some parts of net code [skb allocation, etc]. And I didn't manage to find that "every console driver call adds new messages to the logbuf". OK, suppose that at some point netcon or net stack does kmalloc(GFP_ATOMIC) or alloc_pages(GFP_ATOMIC). So if it was kmalloc() and we need to cache_grow_begin()->kmem_getpages() or ___slab_alloc()->new_slab(), then: - kmem_getpages() can slab_out_of_memory(). But slab_out_of_memory(): a) ratelimited to DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST b) enabled only `#if DEBUG' [probably out of consideration] - So we end up in __alloc_pages_slowpath(): alloc_pages() -> __alloc_pages_slowpath() or kmem_getpages()->__alloc_pages_slowpath() or ___slab_alloc()->new_slab()->__alloc_pages_slowpath() __alloc_pages_slowpath() can call warn_alloc(), yes. But warn_alloc() is also ratelimted to DEFAULT_RATELIMIT_INTERVAL and DEFAULT_RATELIMIT_BURST. That "triggers allocation failure or other warning messages" part is questionable. 1) Does the bug report actually say that - DEFAULT_RATELIMIT_INTERVAL and DEFAULT_RATELIMIT_BURST in MM are not good enough? If so, then why do we "fix" printk()? Can we just tweak the rate limiting in MM code? 2) If the bug report is saying that MM code has non-ratelimited warnings/printouts in allocation path - then we need to rate limit those; the same way it's done in slab_out_of_memory() / warn_alloc(). OK... So DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST allows up to 10 warn_alloc()-s in 5 seconds. Let's assume that allocation failure backtrace is around 50 lines [including registers print out]. Along with backtrace we have Mem-info print out. Let's assume that it's around 30 lines. So a single warn_alloc() can be around 80 lines. Mem-info lines are very long, so it would be reasonable to count the number of characters here, but let's just use lines. So we can have around 800 lines in just 10 seconds. Now, I think "Console is IPMI emulated serial console. Super slow." is a critically important bit of information. [And that's why we better count the number of characters in OOM report, not the lines]. We print to the consoles sequentially, hence netconsole becomes as "fast" as that super slow IPMI console. MM ratelimit, meanwhile, does not care. The slower IPMI is, the "sooner" netconsole gets its chance to add 10 more kmalloc()->warn_alloc() to the logbuf - another ~800 lines, or may be significantly more than that. Because of slow IPMI we don't call netconsole frequent enough to get advantage of the rate limiting in warn_alloc(). [And notice how "100/1000 lines per hour" is a bad news in this regard]. If this is the case, and if my assumptions are valid, then we really should not rate limit console_drivers()->printk(). Mem-info print outs don't seem to be "a repetitive garbage" that we can easily discard. Those are important bits of information, which show the OOM/reclaimer progress, and so on. So, Petr, let's slow down for a second. There are things that are not completely clear. Opinions? [..] > For me it is hard to believe that all these possible errors will be > cured just by offloading. Not to say that offloading is not trivial > and there is some resistance against it. Well, I didn't say this. All I said was that I really like that "the patch does not pretend to be smart and does not drop random printk() messages" part. -ss