> -----Original Message----- > From: Bernd Schubert [mailto:bs_lists@xxxxxxxxxxxxxxxxx] > Sent: Tuesday, April 20, 2010 12:47 AM > To: James Bottomley > Cc: Mike Christie; Desai, Kashyap; linux-scsi@xxxxxxxxxxxxxxx; Bernd > Schubert > Subject: Re: SYNCHRONIZE_CACHE from sd_preppare_flush does not have > retries.! > > On Monday 19 April 2010, James Bottomley wrote: > > On Mon, 2010-04-19 at 20:14 +0200, Bernd Schubert wrote: > > > On Monday 19 April 2010, Mike Christie wrote: > > > > On 04/19/2010 06:32 AM, Desai, Kashyap wrote: > > > > > I am facing one issue with scsi stack. > > > > > Here is a background of my test. > > > > > > > > > > Mount ext3 file system with journaling support with barrier=1, > > > > > commit=5 Now, with this setup file system will do submit_bh > with > > > > > WRITE_BARRIER flag set for interval of 5 seconds. (This is a > part of > > > > > journaling.) Eventually it will call queue_flush() which will > > > > > generate SCSI command of CDB: SYNCHRONIZE_CAHCE and insert it > into > > > > > the request queue. I observed that creation of > SYNCHRONIZE_CACHE is a > > > > > part of sd_prepare_flush(). Here we have timeout set to > SD_TIMEOUT > > > > > but retries are not set. Because of retries of the request is > not > > > > > set, there is no retries allowed for SYNCHRONIZE_CACHE at mid > layer. > > > > > > > > > > Because of zero retries for SYNCHRONIZE_CACHE command at mid- > layer, > > > > > it is creating trouble for file system. In current situation, > Even > > > > > though LLD send back commands with DID_RESET, SYNCHRONIZE_CACHE > will > > > > > fail immediately without going for any retries, when HBA is in > > > > > recovery state. Eventually this information goes to File system > and > > > > > it sees > > > > > SYNCHRONIZE_CAHCE is failed and file system goes to Read only > mode. > > > > > > > > > > My question is "Can we add in sd_prepare_flush(), rq->retries = > X" > > > > > some reasonable retries value ? > > > > > > > > I am not sure where we want it, but I think we want to be able to > set > > > > both the retries and timeout. I have seen where a sync cache can > take > > > > longer than the default 30 secs. > > > > > > > > Do you think we want to the block layer to manage > retries/timeouts for > > > > all block device flushes or is this more device specific? I was > > > > thinking that we may want to create a sysfs interface under the > block > > > > dirs and have blk-sysfs.c and blk-barrier.c handle this. > queue_flush > > > > could set the timeout and retries that is set by some new files > under > > > > /sys/block/sdX/queue/ ? Thanks a lot for your comments. This is very close to my understanding. I feel this is more close to block layer and I am almost agreeing with your thought. I tried to understand why upstream does not have retries at queue_flush()/sd_prepare_flush() ??? It looks like there is not specific reason. If I am wrong can someone explain is there any specific reason not to set rq->retries in sd_prepare_flush? Thanks, Kashyap > > > > > > Good that now also other people run into it. 30s is far too small > for any > > > hardware raid unit with SATA disks. > > > > It's far too short for just about any HW RAID since they all tend to > > have multi-megabytes to gigabytes of cache (some of the high end have > > terrabytes). It has to be said that most arrays with battery backed > > For DDN storage 30s are actually sufficient, unless disk delays come > up. But > then we presently also only have a rather small cache only (2GB) with > lots of > disks. > Nowadays one can get an UPS protected DDN-9900 controller, but the > firmware > still properly handles the SYNC_CACHE command. > > > caches lie when asked to flush the cache, but we probably need to get > > users into the habit of not using flush barriers with external > Arrays. > > > > > http://markmail.org/message/ewicheafcvgwm4p7 > > > > > > I wrote this patch while having trouble with Infortrend Raids, but > it > > > also comes up with DDN storage if the write back cache is enabled. > > > Shall I update the patch, add retries and then resend the entire > series? > > > > rq->timeout is the timeout of the request triggering the flush ... > it's > > likely the wrong value since it's for a fast completing r/w > operation, > > whereas this is a slow drain operation. > > Hmm, in the past we had scsi_device->timeout, but I thought this was > given up > in favour of scsi_device->request_queue->rq_timeout? (somehwere around > 2.6.27?) > > > Thanks, > Bernd > > > -- > Bernd Schubert > DataDirect Networks ��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f