> -----Original Message----- > From: Lukas Kolbe [mailto:lkolbe@xxxxxxxxxxxxxxxxxxxxxxxx] > Sent: Wednesday, December 01, 2010 3:10 PM > To: Kai Makisara > Cc: Boaz Harrosh; linux-scsi@xxxxxxxxxxxxxxx; Desai, Kashyap > Subject: Re: After memory pressure: can't read from tape anymore > > Am Dienstag, den 30.11.2010, 21:53 +0200 schrieb Kai Makisara: > > On Tue, 30 Nov 2010, Boaz Harrosh wrote: > > I'm Cc'ing Desay Kashyap from LSI, maybe he can comment on the hardware > limitations of the SAS1068E? Lukas, No. it is not limitation from h/w that " CONFIG_FUSION_MAX_SGE" needs to be 128. But our code is written such a way that even if you change it more than 128, it will fall down to 128 again. To change this value you need to do below changes in mptbase.h -- -#define MPT_SCSI_SG_DEPTH CONFIG_FUSION_MAX_SGE +#define MPT_SCSI_SG_DEPTH 256 -- 128 is good amount for Scatter gather element. This value is standard value for MPT FUSIION, since long. This value will be reflect to sg_tablesize and linux scatter-gather module will use this value for creating sg_table for HBA. See: " cat /sys/class/scsi_host/host<x>/sg_tablesize" If single IO is not able to fit into sg_tablesize, then it will be converted into multiple IOs for Low Layer Drivers(By "scatter-gather" module of linux). So I do not see any problem with CONFIG_FUSION_MAX_SGE value. Our driver internally convert sglist into SGE which understood by LSI H/W. Thanks, Kashyap > > > ... > > > I looked at enlarge_buffer() and it looks fragile and broken. If > you really > > > need a pointer eg: > > > STbuffer->b_data = page_address(STbuffer->reserved_pages[0]); > > > > > If you think it is broken, please fix it. > > > > > Than way not use vmalloc() for buffers larger then PAGE_SIZE? But > better yet > > > avoid it by keeping a pages_array or sg-list and operate on an aio > type > > > operations. > > > > > vmalloc() is not a solution here. Think about this from the HBA side. > Each > > s/g segment must be contiguous in the address space the HBA uses. In > many > > cases this is the physical memory address space. Any solution must > make > > sure that the HBA can perform the requested data transfer. > > > > > > Kai > > > > > > But I understand this is a lot of work on an old driver. Perhaps > pre-allocate > > > something big at startup. specified by user? > > > > > This used to be possible at some time and it could be made possible > again. > > But I don't like this option because it means that the users must > > explicitly set the boot parameters. > > > > And it is difficult for me to believe the modern SAS HBAs only > support 128 > > s/g segments. > > > > Kai > > > For reference, here's my original message with Kais reply: > > > Hi, > > > > On our backup system (2 LTO4 drives/Tandberg library via LSISAS1068E, > > Kernel 2.6.36 with the stock Fusion MPT SAS Host driver 3.04.17 on > > debian/squeeze), we see reproducible tape read and write failures > after > > the system was under memory pressure: > > > > [342567.297152] st0: Can't allocate 2097152 byte tape buffer. > > [342569.316099] st0: Can't allocate 2097152 byte tape buffer. > > [342570.805164] st0: Can't allocate 2097152 byte tape buffer. > > [342571.958331] st0: Can't allocate 2097152 byte tape buffer. > > [342572.704264] st0: Can't allocate 2097152 byte tape buffer. > > [342873.737130] st: from_buffer offset overflow. > > > > Bacula is spewing this message every time it tries to access the tape > > drive: > > 28-Nov 19:58 sd1.techfak JobId 2857: Error: block.c:1002 Read error > on fd=10 at file:blk 0:0 on device "drv2" (/dev/nst0). ERR=Input/output > error > > > > By memory pressure, I mean that the KVM processes containing the > > postgres-db (~20million files) and the bacula director have used all > > available RAM, one of them used ~4GiB of its 12GiB swap for an hour > or > > so (by selecting a full restore, it seems that the whole directory > tree > > of the 15mio files backup gets read into memory). After this, I > wasn't > > able to read from the second tape drive anymore (/dev/st0); whereas > the > > first tape drive was restoring the data happily (it is currently > about > > halfway through a 3TiB restore from 5 tapes). > > > > This same behaviour appears when we're doing a few incremental > backups; > > after a while, it just isn't possible to use the tape drives anymore > - > > every I/O operation gives an I/O Error, even a simple dd bs=64k > > count=10. After a restart, the system behaves correctly until > > -seemingly- another memory pressure situation occured. > > > This is predictable. The maximum number of scatter/gather segments > seems > to be 128. The st driver first tries to set up transfer directly from > the > user buffer to the HBA. The user buffer is usually fragmented so that > one > scatter/gather segment is used for each page. Assuming 4 kB page size, > the > maximu size of the direct transfer is 128 x 4 = 512 kB. > > When this fails, the driver tries to allocate a kernel buffer so that > there larger than 4 kB physically contiguous segments. Let's assume > that > it can find 128 16 kB segments. In this case the maximum block size is > 2048 kB. Memory pressure results in memory fragmentation and the driver > can't find large enough segments and allocation fails. This is what you > are seeing. > > So, one solution is to use 512 kB block size. Another one is to try to > find out if the 128 segment limit is a physical limitation or just a > choice. In the latter case the mptsas driver could be modified to > support > larger block size even after memory fragmentation. > > Kai > > > > ÿô.nÇ·®+%˱é¥wÿº{.nÇ·¥{±þÇø¡Ü}©²ÆzÚj:+v¨þø®w¥þàÞ¨è&¢)ß«a¶Úÿûz¹ÞúÝjÿwèf