Re: Isssues with very large LUN count servers and booting becoming more and more of a problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/25/21 1:28 AM, Laurence Oberman wrote:
On Mon, 2021-03-22 at 17:02 -0400, Laurence Oberman wrote:
Hello
We have been struggling with this for years.
Systems are getting so large now that a system with multi-terabyte
memory and 1000's of device paths is becoming common.

For example, customers are seeing 16 paths and with a 1000 LUNS thats
16000 multiline console log discovery etc.

We land up in Emergency mode and various incatanations of "cant boot"
due to console putput slowdown that (while worse on serial consoles)
is
still huge overhead that can even require us to use watchdog_thresh
on
the kernel line to prevent the NMI's

I started thinking about a new parameter for scsi_mod that could be
used by sd and the scsi_dh_alua probing / discovery messaging (that
is
so noisy), to quieten it down.

Before I even put efort into this, I wanted to see if you folks have
an
appetite for this.

We have been blacklisting HBA drivers and using verious printk masks
etc to overcome this but a way to mask this within sd.c and
scsi_dh_alua.c I think could work better.
It would not be the default of course but an option to be added for
these huge customers.
I would look do do the minimal logging for a device discovery, just
so
some messaging is there for debug etc and I think it will help.

If this is a crazy idea, let me know and I wont pursue it, but I
decided to just put it out there.

Best Regards
Laurence Oberman


Replying to my own thread with more information

RFE: Introduce two new macros to manage the crazy amount of boot
logging we get with the large LUN count systems

sd_printk_boot_control
sdev_printk_boot_control

These macros have an extra parameter boot_log_enable and if its default
(1) then logs are printed
adding scsi_mod.scsi_alua_boot_logging=0 will quiet down the logging
for these huge systems

With no parameter (default) nothing changes in the logging

With boot log control and regular console
134s to boot and 1987 lines with 80 devices and 2 paths

With no boot control (default) and regular console
170s to boot and about 4000 lines of logging

The patch inline is not final so I did not send with git given this is
an RFE.
t is included to show the changes I was thinking about.

Well, _actually_ it's not just the SCSI drivers; it's just that the scsi driver exhibits these issues nicely.

The hope I had was that we can resolve this issue by making printk asynchrounous, such that each call to printk() wouldn't block.

The really should give us most what we want; the only issue is what to do with those messages which are spooled (but not printed). For graphical UI this probably doesn't matter as the user will end up with a graphical interface sooner or later.

For text console things become tricky; we will need the console to get our prompt, but it might still be busy printing out stuff.

Can't we have a 'low priority' output of these messages, and stop printing them to the console once 'getty' starts?

Thing is, once 'getty' is up and running the user _can_ log in, so he can any debugging he likes from the system console; there the message log on the console is less important as the user can get the system log via other means. It only gets important once getty is _not_ up, but then it's less time critical as there's nothing the user _can_ do.

Thoughts?

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux