RE: SYNCHRONIZE_CACHE from sd_preppare_flush does not have retries.!

"Desai, Kashyap" <Kashyap.Desai@xxxxxxx> · Tue, 20 Apr 2010 10:35:41 +0530

> -----Original Message-----
> From: Bernd Schubert [mailto:bs_lists@xxxxxxxxxxxxxxxxx]
> Sent: Tuesday, April 20, 2010 12:47 AM
> To: James Bottomley
> Cc: Mike Christie; Desai, Kashyap; linux-scsi@xxxxxxxxxxxxxxx; Bernd
> Schubert
> Subject: Re: SYNCHRONIZE_CACHE from sd_preppare_flush does not have
> retries.!
> 
> On Monday 19 April 2010, James Bottomley wrote:
> > On Mon, 2010-04-19 at 20:14 +0200, Bernd Schubert wrote:
> > > On Monday 19 April 2010, Mike Christie wrote:
> > > > On 04/19/2010 06:32 AM, Desai, Kashyap wrote:
> > > > > I am facing one issue with scsi stack.
> > > > > Here is a background of my test.
> > > > >
> > > > > Mount ext3 file system with journaling support with barrier=1,
> > > > > commit=5 Now, with this setup file system will do submit_bh
> with
> > > > > WRITE_BARRIER flag set for interval of 5 seconds. (This is a
> part of
> > > > > journaling.) Eventually it will call queue_flush() which will
> > > > > generate SCSI command of CDB: SYNCHRONIZE_CAHCE and insert it
> into
> > > > > the request queue. I observed that creation of
> SYNCHRONIZE_CACHE is a
> > > > > part of sd_prepare_flush(). Here we have timeout set to
> SD_TIMEOUT
> > > > > but retries are not set. Because of retries of the request is
> not
> > > > > set, there is no retries allowed for SYNCHRONIZE_CACHE at mid
> layer.
> > > > >
> > > > > Because of zero retries for SYNCHRONIZE_CACHE command at mid-
> layer,
> > > > > it is creating trouble for file system. In current situation,
> Even
> > > > > though LLD send back commands with DID_RESET, SYNCHRONIZE_CACHE
> will
> > > > > fail immediately without going for any retries, when HBA is in
> > > > > recovery state. Eventually this information goes to File system
> and
> > > > > it sees
> > > > > SYNCHRONIZE_CAHCE is failed and file system goes to Read only
> mode.
> > > > >
> > > > > My question is "Can we add in sd_prepare_flush(), rq->retries =
> X"
> > > > > some reasonable retries value ?
> > > >
> > > > I am not sure where we want it, but I think we want to be able to
> set
> > > > both the retries and timeout. I have seen where a sync cache can
> take
> > > > longer than the default 30 secs.
> > > >
> > > > Do you think we want to the block layer to manage
> retries/timeouts for
> > > > all block device flushes or is this more device specific? I was
> > > > thinking that we may want to create a sysfs interface under the
> block
> > > > dirs and have blk-sysfs.c and blk-barrier.c handle this.
> queue_flush
> > > > could set the timeout and retries that is set by some new files
> under
> > > > /sys/block/sdX/queue/ ?
Thanks a lot for your comments.

This is very close to my understanding. I feel this is more close to block layer and I am almost agreeing with your thought.
I tried to understand why upstream does not have retries at queue_flush()/sd_prepare_flush() ??? It looks like there is not specific reason.
If I am wrong can someone explain is there any specific reason not to set rq->retries in sd_prepare_flush? 

Thanks,
Kashyap
> > >
> > > Good that now also other people run into it. 30s is far too small
> for any
> > > hardware raid unit with SATA disks.
> >
> > It's far too short for just about any HW RAID since they all tend to
> > have multi-megabytes to gigabytes of cache (some of the high end have
> > terrabytes).  It has to be said that most arrays with battery backed
> 
> For DDN storage 30s are actually sufficient, unless disk delays come
> up. But
> then we presently also only have a rather small cache only (2GB) with
> lots of
> disks.
> Nowadays one can get an UPS protected DDN-9900 controller, but the
> firmware
> still properly handles the SYNC_CACHE command.
> 
> > caches lie when asked to flush the cache, but we probably need to get
> > users into the habit of not using flush barriers with external
> Arrays.
> >
> > > http://markmail.org/message/ewicheafcvgwm4p7
> > >
> > > I wrote this patch while having trouble with Infortrend Raids, but
> it
> > > also comes up with DDN storage if the write back cache is enabled.
> > > Shall I update the patch, add retries and then resend the entire
> series?
> >
> > rq->timeout is the timeout of the request triggering the flush ...
> it's
> > likely the wrong value since it's for a fast completing r/w
> operation,
> > whereas this is a slow drain operation.
> 
> Hmm, in the past we had scsi_device->timeout, but I thought this was
> given up
> in favour of scsi_device->request_queue->rq_timeout? (somehwere around
> 2.6.27?)
> 
> 
> Thanks,
> Bernd
> 
> 
> --
> Bernd Schubert
> DataDirect Networks
��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f