On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: > > -----Original Message----- > > From: James Bottomley [mailto:jejb@xxxxxxxxxxxxxxxxxx] > > Sent: Monday, October 17, 2016 10:50 PM > > To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; > > linux-scsi@xxxxxxxxxxxxxxx > > Cc: martin.petersen@xxxxxxxxxx; thenzl@xxxxxxxxxx; > > kashyap.desai@xxxxxxxxxxxx; Christoph Hellwig; Martin K. Petersen; > > Jeff > > Moyer; Gris Ge; Ewan Milne; Jens Axboe > > Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE > > command > > to firmware > > > > On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: > > > On 10/17/2016 12:20 PM, James Bottomley wrote: > > > > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: > > > > > On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > > > > > > On 10/17/2016 12:24 PM, Sumit Saxena wrote: > > > > > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command > > > > > > > back to > > > > > > > SCSI layer without sending it to firmware as firmware > > > > > > > takes > > > > > > > care of flushing cache. This patch will change the > > > > > > > driver > > > > > > > behavior wrt SYNCHRONIZE_CACHE handling. If underlying > > > > > > > firmware has support to handle the SYNCHORNIZE_CACHE, > > > > > > > driver > > > > > > > will send it for firmware otherwise complete it back to > > > > > > > SCSI > > > > > > > layer with SUCCESS immediately. If Firmware handle > > > > > > > SYNCHORNIZE_CACHE for both VD and JBOD > > > > > > > "canHandleSyncCache" > > > > > > > bit in scratch pad register(offset > > > > > > > 0x00B4) will be set. > > > > > > > > > > > > > > This behavior can be controlled via module parameter and > > > > > > > user > > > > > > > can fallback to old behavior of returning > > > > > > > SYNCHRONIZE_CACHE by > > > > > > > driver only without sending it to firmware. > > > > > > > > > > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@xxxxxxxxxxxx> > > > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> > > > > > > > --- > > > > > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > > > > > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14 > > > > > > > ++++++--- > > > > > > > ----- > > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > > > > > > > 4 files changed, 14 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > b/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > index ca86c88..43fd14f 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > > > > > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > > > > > > > #define MR_MAX_MSIX_REG_ARRAY 16 > > > > > > > #define MR_RDPQ_MODE_OFFSET 0X0 > > > > > > > 0800 > > > > > > > 000 > > > > > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 > > > > > > > X010 > > > > > > > 0000 > > > > > > > 0 > > > > > > > + > > > > > > > /* > > > > > > > * register set for both 1068 and 1078 controllers > > > > > > > * structure extended for 1078 registers @@ -2140,6 > > > > > > > +2142,7 > > > > > > > @@ struct megasas_instance { > > > > > > > u8 is_imr; > > > > > > > u8 is_rdpq; > > > > > > > bool dev_handle; > > > > > > > + bool fw_sync_cache_support; > > > > > > > }; > > > > > > > struct MR_LD_VF_MAP { > > > > > > > u32 size; > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > index 092387f..a4a8e2f 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = > > > > > > > MEGASAS_DEFAULT_CMD_TIMEOUT; > > > > > > > module_param(scmd_timeout, int, S_IRUGO); > > > > > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout > > > > > > > (10 > > > > > > > -90s), default 90s. See megasas_reset_timer."); > > > > > > > > > > > > > > +bool block_sync_cache; > > > > > > > +module_param(block_sync_cache, bool, S_IRUGO); > > > > > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by > > > > > > > driver > > > > > > > Default: 0(Send it to firmware)"); > > > > > > > + > > > > > > > MODULE_LICENSE("GPL"); > > > > > > > MODULE_VERSION(MEGASAS_VERSION); > > > > > > > MODULE_AUTHOR("megaraidlinux.pdl@xxxxxxxxxxxxx"); > > > > > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct > > Scsi_Host > > > > > > > *shost, struct scsi_cmnd *scmd) > > > > > > > goto out_done; > > > > > > > } > > > > > > > > > > > > > > - switch (scmd->cmnd[0]) { > > > > > > > - case SYNCHRONIZE_CACHE: > > > > > > > - /* > > > > > > > - * FW takes care of flush cache on its > > > > > > > own > > > > > > > - * No need to send it down > > > > > > > - */ > > > > > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > > > > > > > + (!instance->fw_sync_cache_support)) { > > > > > > > scmd->result = DID_OK << 16; > > > > > > > goto out_done; > > > > > > > - default: > > > > > > > - break; > > > > > > > } > > > > > > > > > > > > > > return instance->instancet > > > > > > > ->build_and_issue_cmd(instance, scmd); > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > index 2159f6a..8237580 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct > > > > > > > megasas_instance *instance) > > > > > > > ret = 1; > > > > > > > goto fail_fw_init; > > > > > > > } > > > > > > > + if (!block_sync_cache) > > > > > > > + instance->fw_sync_cache_support = > > > > > > > (scratch_pad_2 > > > > > > > & > > > > > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) > > > > > > > ? 1 > > > > > > > : > > > > > > > 0; > > > > > > > > > > > > > > IOCInitMessage = > > > > > > > dma_alloc_coherent(&instance->pdev->dev, > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > index e3bee04..2857154 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > @@ -72,6 +72,8 @@ > > > > > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET > > > > > > > (0x0000030C) > > > > > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 > > > > > > > 0000 > > > > > > > 6C) > > > > > > > > > > > > > > +extern bool block_sync_cache; > > > > > > > + > > > > > > > /* > > > > > > > * Raid context flags > > > > > > > */ > > > > > > > > > > > > > Be extra careful with that. > > > > > > > > > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as > > > > > > there > > > > > > might be an array-wide cache, and a cache flush would > > > > > > affect the > > > > > > entire cache. Linux is using SYNCHRONIZE_CACHE as a per > > > > > > device > > > > > > setting, ie it assumes that the effects of a cache flush is > > > > > > restricted to the device in question. > > > > > > > > > > > > If this is _not_ the case I'd rather not enable it. > > > > > > Have you checked that enabling this functionality doesn't > > > > > > have > > > > > > negative performance impact? > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Hannes > > > > > This must go in - without this fix, there is no data > > > > > integrity for > > > > > any file system. > > > > That's not what I get from the change log. What it says to me > > > > is > > > > that the caches are currently firmware managed. Barring > > > > firmware > > > > bugs, that means that we currently don't have any integrity > > > > issues. > > > > > > Your understanding (or the change log) is incorrect. > > > > There's no current way in English of construing "as firmware takes > > care of > > flushing cache" to mean its inverse, so the changelog needs > > updating to > > explain > > that firmware does *not* take care of cache flushing, so > > effectively > > nothing > > does and we'll need a cc to stable if the problem is that nothing > > takes > > care of > > flushing the drive caches. > > > > James > > Sorry for confusion. More accurate to say - > > Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to > SCSI layer without sending it down to firmware as firmware supposed > to takes care of flushing cache for all Virtual Disk at the time of > system reboot/shutdown. Because of above FW rule driver coded wrongly > and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD > as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass > the same to the Firmware. ). We figure out that even for VD, why > driver should send back SYNC_CACHE command...let's have the same in > FW (giving all control in FW) > > New behavior described - > > IF application requests SYNCH_CACHE > IF any FW which supports new API bit called canHandleSyncCache > Driver sends SYNCH_CACHE command to the FW > IF 'VirtualDisk' > FW does not send SYNCH_CACHE to drives > FW returns SUCCESS > ELSE IF 'JBOD' > FW sends SYNCH_CACHE to drive > FW obtains status from drive and returns same status back to > driver > ENDIF > > Fixing this is robust if FW and driver changes are inline. See below > for more detail. > > Case - 1 > This patch is attempt to fix one level problem where Virtual Disks > are not managed correctly in MR FW. There are some MR FW (E.a Gen2 > and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not > reply correct for SYNCH_CACHE. This was handled in past and carry > forward (in driver + FW ) to all new generation products as well. We > tried to collect all possible details why it was handled such way to > provide better driver fix for this issue(mainly to avoid this FW > checks and module parameters etc..). But now it looks like to make > generic fix (Device ID based etc..)is even risky, so went with this > protective approach. It is very risky to find out which Device ID > and FW has capability to manage SYNC_CACHE, so we managed to figure > out which are the new FW using FW capability bit. > > E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported > command (this is a firmware bug) and immediately it will be failed to > SML. This will cause File system to go Read-only. That's a better description ... what you're saying is that the patch doesn't actually fix the bug Ric is worrying about. > Case -2 > One more thing which we are trying and known driver can send > "SYNC_CACHE" unconditionally to all generation of FW. For this we > are doing more unit testing on various controllers/FW and planning to > provide another patch to fix JBOD path for any FW. (It will not be > based on FW capability/module parameter). OK, but we really need this ASAP. As Ric said, data integrity is at risk until this is fixed. Can you also reference the commit that introduced the problem so we know how far to do the sable backports? > If we remove module parameter, we can ask customer to disable WCE on > drive to get similar impact. Thanks, James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html