Re: Isssues with very large LUN count servers and booting becoming more and more of a problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2021-03-26 at 16:08 +0100, Hannes Reinecke wrote:
> On 3/25/21 1:28 AM, Laurence Oberman wrote:
> > On Mon, 2021-03-22 at 17:02 -0400, Laurence Oberman wrote:
> > > Hello
> > > We have been struggling with this for years.
> > > Systems are getting so large now that a system with multi-
> > > terabyte
> > > memory and 1000's of device paths is becoming common.
> > > 
> > > For example, customers are seeing 16 paths and with a 1000 LUNS
> > > thats
> > > 16000 multiline console log discovery etc.
> > > 
> > > We land up in Emergency mode and various incatanations of "cant
> > > boot"
> > > due to console putput slowdown that (while worse on serial
> > > consoles)
> > > is
> > > still huge overhead that can even require us to use
> > > watchdog_thresh
> > > on
> > > the kernel line to prevent the NMI's
> > > 
> > > I started thinking about a new parameter for scsi_mod that could
> > > be
> > > used by sd and the scsi_dh_alua probing / discovery messaging
> > > (that
> > > is
> > > so noisy), to quieten it down.
> > > 
> > > Before I even put efort into this, I wanted to see if you folks
> > > have
> > > an
> > > appetite for this.
> > > 
> > > We have been blacklisting HBA drivers and using verious printk
> > > masks
> > > etc to overcome this but a way to mask this within sd.c and
> > > scsi_dh_alua.c I think could work better.
> > > It would not be the default of course but an option to be added
> > > for
> > > these huge customers.
> > > I would look do do the minimal logging for a device discovery,
> > > just
> > > so
> > > some messaging is there for debug etc and I think it will help.
> > > 
> > > If this is a crazy idea, let me know and I wont pursue it, but I
> > > decided to just put it out there.
> > > 
> > > Best Regards
> > > Laurence Oberman
> > > 
> > 
> > Replying to my own thread with more information
> > 
> > RFE: Introduce two new macros to manage the crazy amount of boot
> > logging we get with the large LUN count systems
> > 
> > sd_printk_boot_control
> > sdev_printk_boot_control
> > 
> > These macros have an extra parameter boot_log_enable and if its
> > default
> > (1) then logs are printed
> > adding scsi_mod.scsi_alua_boot_logging=0 will quiet down the
> > logging
> > for these huge systems
> > 
> > With no parameter (default) nothing changes in the logging
> > 
> > With boot log control and regular console
> > 134s to boot and 1987 lines with 80 devices and 2 paths
> > 
> > With no boot control (default) and regular console
> > 170s to boot and about 4000 lines of logging
> > 
> > The patch inline is not final so I did not send with git given this
> > is
> > an RFE.
> > t is included to show the changes I was thinking about.
> > 
> 
> Well, _actually_ it's not just the SCSI drivers; it's just that the
> scsi 
> driver exhibits these issues nicely.
> 
> The hope I had was that we can resolve this issue by making printk 
> asynchrounous, such that each call to printk() wouldn't block.
> 
> The really should give us most what we want; the only issue is what
> to 
> do with those messages which are spooled (but not printed).
> For graphical UI this probably doesn't matter as the user will end
> up 
> with a graphical interface sooner or later.
> 
> For text console things become tricky; we will need the console to
> get 
> our prompt, but it might still be busy printing out stuff.
> 
> Can't we have a 'low priority' output of these messages, and stop 
> printing them to the console once 'getty' starts?
> 
> Thing is, once 'getty' is up and running the user _can_ log in, so
> he 
> can any debugging he likes from the system console; there the
> message 
> log on the console is less important as the user can get the system
> log 
> via other means.
> It only gets important once getty is _not_ up, but then it's less
> time 
> critical as there's nothing the user _can_ do.
> 
> Thoughts?
> 
> Cheers,
> 
> Hannes

Hi Hannes

Indeed I use scsi as the example because for these Enterprise
Production environments, that is the single biggest consumer of the
messaging.
That and the alua messages.

We have two issues.
1. systems stopping in emergency mode because LVM times out during all
the discovery, and in cases device-mapper-multipath devices also not
coming ready.

2. Hard lockups due to both the serial and VGA slow consolw code where
the logging and discovery is writing to console. has a lock and then
the lock ownership access exceeds the watchdog_thresh 

If we could prevent the logging until all the switchroot stuff happens
and the systems have scanned and onlined the LVM and device-mapper
stuff etc., then changing the priority may work. 

The problem only delaying the output is that it wont solve the hard
lockup due to the console code slowing down tasks writing to it so they
exceeds the 10s default for watchdog_thresh. I had to use 30 or 60 in
certain cases even with a VGA console and still hit the hard lockup.
Using modprpbe after boot the systems are up  one at a time for say
qla2xxx and lpfc we seem to handle the logging without triggering the
watchdog.

Of course the question then becomes what are we missing for debugging
with the silence. In the patch I showed as an example, I was selective
about what I left out but I do log the disk being attached for example.

Its a tough problem to solve. The console code is not going to be fixed
to make it scale better for some time, although I know folks are
working on it.

I sent a patch some time back that we use quite often now to increase
the watchdog_thresh on the kernel line but my DL980 and these other
Giant HPE Superdomes etc. still see hardlockups not only during boot
but sometimes just in error recovery when rports leave and a giant log
storm happens when they recover.

I just figured what I am doing is the lowest risk and could be re-
enabled once up witha simple echo command so we dont miss anything
after boot.

I will think about this some more 
I do appreciate the comments and discussion.

Regards
Laurence




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux