Re: SYNCHRONIZE_CACHE from sd_preppare_flush does not have retries.!

Bernd Schubert <bs_lists@xxxxxxxxxxxxxxxxx> · Mon, 19 Apr 2010 21:17:15 +0200

On Monday 19 April 2010, James Bottomley wrote:
> On Mon, 2010-04-19 at 20:14 +0200, Bernd Schubert wrote:
> > On Monday 19 April 2010, Mike Christie wrote:
> > > On 04/19/2010 06:32 AM, Desai, Kashyap wrote:
> > > > I am facing one issue with scsi stack.
> > > > Here is a background of my test.
> > > >
> > > > Mount ext3 file system with journaling support with barrier=1,
> > > > commit=5 Now, with this setup file system will do submit_bh with 
> > > > WRITE_BARRIER flag set for interval of 5 seconds. (This is a part of
> > > > journaling.) Eventually it will call queue_flush() which will
> > > > generate SCSI command of CDB: SYNCHRONIZE_CAHCE and insert it into
> > > > the request queue. I observed that creation of SYNCHRONIZE_CACHE is a
> > > > part of sd_prepare_flush(). Here we have timeout set to SD_TIMEOUT
> > > > but retries are not set. Because of retries of the request is not
> > > > set, there is no retries allowed for SYNCHRONIZE_CACHE at mid layer.
> > > >
> > > > Because of zero retries for SYNCHRONIZE_CACHE command at mid-layer,
> > > > it is creating trouble for file system. In current situation, Even
> > > > though LLD send back commands with DID_RESET, SYNCHRONIZE_CACHE will
> > > > fail immediately without going for any retries, when HBA is in
> > > > recovery state. Eventually this information goes to File system and
> > > > it sees
> > > > SYNCHRONIZE_CAHCE is failed and file system goes to Read only mode.
> > > >
> > > > My question is "Can we add in sd_prepare_flush(), rq->retries = X"
> > > > some reasonable retries value ?
> > >
> > > I am not sure where we want it, but I think we want to be able to set
> > > both the retries and timeout. I have seen where a sync cache can take
> > > longer than the default 30 secs.
> > >
> > > Do you think we want to the block layer to manage retries/timeouts for
> > > all block device flushes or is this more device specific? I was
> > > thinking that we may want to create a sysfs interface under the block
> > > dirs and have blk-sysfs.c and blk-barrier.c handle this. queue_flush
> > > could set the timeout and retries that is set by some new files under
> > > /sys/block/sdX/queue/ ?
> >
> > Good that now also other people run into it. 30s is far too small for any
> > hardware raid unit with SATA disks.
> 
> It's far too short for just about any HW RAID since they all tend to
> have multi-megabytes to gigabytes of cache (some of the high end have
> terrabytes).  It has to be said that most arrays with battery backed

For DDN storage 30s are actually sufficient, unless disk delays come up. But 
then we presently also only have a rather small cache only (2GB) with lots of 
disks. 
Nowadays one can get an UPS protected DDN-9900 controller, but the firmware 
still properly handles the SYNC_CACHE command.

> caches lie when asked to flush the cache, but we probably need to get
> users into the habit of not using flush barriers with external Arrays.
> 
> > http://markmail.org/message/ewicheafcvgwm4p7
> >
> > I wrote this patch while having trouble with Infortrend Raids, but it
> > also comes up with DDN storage if the write back cache is enabled.
> > Shall I update the patch, add retries and then resend the entire series?
> 
> rq->timeout is the timeout of the request triggering the flush ... it's
> likely the wrong value since it's for a fast completing r/w operation,
> whereas this is a slow drain operation.

Hmm, in the past we had scsi_device->timeout, but I thought this was given up 
in favour of scsi_device->request_queue->rq_timeout? (somehwere around 
2.6.27?)

Thanks,
Bernd

-- 
Bernd Schubert
DataDirect Networks
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html