Hi Tomas, SCSI command checking in queuecommand function of arcmsr can be removed safely. Now driver can pass all scsi command to controller firmware. diff -uprN a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c --- a/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:18:45.000000000 +0800 +++ b/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:31:59.665524699 +0800 @@ -2623,13 +2623,6 @@ static int arcmsr_queue_command_lck(stru cmd->scsi_done = done; cmd->host_scribble = NULL; cmd->result = 0; - if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == SEND_DIAGNOSTIC)){ - if(acb->devstate[target][lun] == ARECA_RAID_GONE) { - cmd->result = (DID_NO_CONNECT << 16); - } - cmd->scsi_done(cmd); - return 0; - } if (target == 16) { /* virtual device for iop message transfer */ arcmsr_handle_virtual_command(acb, cmd); Thanks Ching On Tue, 2016-10-18 at 15:56 +0200, Tomas Henzl wrote: > Hi, > > similar suspicious code path can be found in the queuecommand functions in other drivers too > these are - > pmcraid.c > arcmsr_hba.c > cc-ing maintainers - > (but both drivers seem to be unmaintained for a while - > I've added Ching for arcmsr and Raghava for pmcraid) > > please read this thread and verify that your driver and firmware is safe. > > It is expected that a raid card controls the disk write cache (switches it off) > but this might not be true for all modes of operation a raid adapter handles > - pass through/non-RAID etc. In such case when disk write cache is enabled > and your driver skips the SYNCHRONIZE_CACHE command a FS corruption > is very much possible during unexpected power loss or even a clean shutdown. > > tomash > > On 17.10.2016 19:51, James Bottomley wrote: > > On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: > >>> -----Original Message----- > >>> From: James Bottomley [mailto:jejb@xxxxxxxxxxxxxxxxxx] > >>> Sent: Monday, October 17, 2016 10:50 PM > >>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; > >>> linux-scsi@xxxxxxxxxxxxxxx > >>> Cc: martin.petersen@xxxxxxxxxx; thenzl@xxxxxxxxxx; > >>> kashyap.desai@xxxxxxxxxxxx; Christoph Hellwig; Martin K. Petersen; > >>> Jeff > >>> Moyer; Gris Ge; Ewan Milne; Jens Axboe > >>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE > >>> command > >>> to firmware > >>> > >>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: > >>>> On 10/17/2016 12:20 PM, James Bottomley wrote: > >>>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: > >>>>>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > >>>>>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote: > >>>>>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command > >>>>>>>> back to > >>>>>>>> SCSI layer without sending it to firmware as firmware > >>>>>>>> takes > >>>>>>>> care of flushing cache. This patch will change the > >>>>>>>> driver > >>>>>>>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying > >>>>>>>> firmware has support to handle the SYNCHORNIZE_CACHE, > >>>>>>>> driver > >>>>>>>> will send it for firmware otherwise complete it back to > >>>>>>>> SCSI > >>>>>>>> layer with SUCCESS immediately. If Firmware handle > >>>>>>>> SYNCHORNIZE_CACHE for both VD and JBOD > >>>>>>>> "canHandleSyncCache" > >>>>>>>> bit in scratch pad register(offset > >>>>>>>> 0x00B4) will be set. > >>>>>>>> > >>>>>>>> This behavior can be controlled via module parameter and > >>>>>>>> user > >>>>>>>> can fallback to old behavior of returning > >>>>>>>> SYNCHRONIZE_CACHE by > >>>>>>>> driver only without sending it to firmware. > >>>>>>>> > >>>>>>>> Signed-off-by: Sumit Saxena <sumit.saxena@xxxxxxxxxxxx> > >>>>>>>> Signed-off-by: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> > >>>>>>>> --- > >>>>>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > >>>>>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 > >>>>>>>> ++++++--- > >>>>>>>> ----- > >>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > >>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > >>>>>>>> 4 files changed, 14 insertions(+), 8 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>> index ca86c88..43fd14f 100644 > >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > >>>>>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > >>>>>>>> #define MR_MAX_MSIX_REG_ARRAY 16 > >>>>>>>> #define MR_RDPQ_MODE_OFFSET 0X0 > >>>>>>>> 0800 > >>>>>>>> 000 > >>>>>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 > >>>>>>>> X010 > >>>>>>>> 0000 > >>>>>>>> 0 > >>>>>>>> + > >>>>>>>> /* > >>>>>>>> * register set for both 1068 and 1078 controllers > >>>>>>>> * structure extended for 1078 registers @@ -2140,6 > >>>>>>>> +2142,7 > >>>>>>>> @@ struct megasas_instance { > >>>>>>>> u8 is_imr; > >>>>>>>> u8 is_rdpq; > >>>>>>>> bool dev_handle; > >>>>>>>> + bool fw_sync_cache_support; > >>>>>>>> }; > >>>>>>>> struct MR_LD_VF_MAP { > >>>>>>>> u32 size; > >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>> index 092387f..a4a8e2f 100644 > >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = > >>>>>>>> MEGASAS_DEFAULT_CMD_TIMEOUT; > >>>>>>>> module_param(scmd_timeout, int, S_IRUGO); > >>>>>>>> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout > >>>>>>>> (10 > >>>>>>>> -90s), default 90s. See megasas_reset_timer."); > >>>>>>>> > >>>>>>>> +bool block_sync_cache; > >>>>>>>> +module_param(block_sync_cache, bool, S_IRUGO); > >>>>>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by > >>>>>>>> driver > >>>>>>>> Default: 0(Send it to firmware)"); > >>>>>>>> + > >>>>>>>> MODULE_LICENSE("GPL"); > >>>>>>>> MODULE_VERSION(MEGASAS_VERSION); > >>>>>>>> MODULE_AUTHOR("megaraidlinux.pdl@xxxxxxxxxxxxx"); > >>>>>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct > >>> Scsi_Host > >>>>>>>> *shost, struct scsi_cmnd *scmd) > >>>>>>>> goto out_done; > >>>>>>>> } > >>>>>>>> > >>>>>>>> - switch (scmd->cmnd[0]) { > >>>>>>>> - case SYNCHRONIZE_CACHE: > >>>>>>>> - /* > >>>>>>>> - * FW takes care of flush cache on its > >>>>>>>> own > >>>>>>>> - * No need to send it down > >>>>>>>> - */ > >>>>>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > >>>>>>>> + (!instance->fw_sync_cache_support)) { > >>>>>>>> scmd->result = DID_OK << 16; > >>>>>>>> goto out_done; > >>>>>>>> - default: > >>>>>>>> - break; > >>>>>>>> } > >>>>>>>> > >>>>>>>> return instance->instancet > >>>>>>>> ->build_and_issue_cmd(instance, scmd); > >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>> index 2159f6a..8237580 100644 > >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct > >>>>>>>> megasas_instance *instance) > >>>>>>>> ret = 1; > >>>>>>>> goto fail_fw_init; > >>>>>>>> } > >>>>>>>> + if (!block_sync_cache) > >>>>>>>> + instance->fw_sync_cache_support = > >>>>>>>> (scratch_pad_2 > >>>>>>>> & > >>>>>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) > >>>>>>>> ? 1 > >>>>>>>> : > >>>>>>>> 0; > >>>>>>>> > >>>>>>>> IOCInitMessage = > >>>>>>>> dma_alloc_coherent(&instance->pdev->dev, > >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>> index e3bee04..2857154 100644 > >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>> @@ -72,6 +72,8 @@ > >>>>>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET > >>>>>>>> (0x0000030C) > >>>>>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 > >>>>>>>> 0000 > >>>>>>>> 6C) > >>>>>>>> > >>>>>>>> +extern bool block_sync_cache; > >>>>>>>> + > >>>>>>>> /* > >>>>>>>> * Raid context flags > >>>>>>>> */ > >>>>>>>> > >>>>>>> Be extra careful with that. > >>>>>>> > >>>>>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as > >>>>>>> there > >>>>>>> might be an array-wide cache, and a cache flush would > >>>>>>> affect the > >>>>>>> entire cache. Linux is using SYNCHRONIZE_CACHE as a per > >>>>>>> device > >>>>>>> setting, ie it assumes that the effects of a cache flush is > >>>>>>> restricted to the device in question. > >>>>>>> > >>>>>>> If this is _not_ the case I'd rather not enable it. > >>>>>>> Have you checked that enabling this functionality doesn't > >>>>>>> have > >>>>>>> negative performance impact? > >>>>>>> > >>>>>>> Cheers, > >>>>>>> > >>>>>>> Hannes > >>>>>> This must go in - without this fix, there is no data > >>>>>> integrity for > >>>>>> any file system. > >>>>> That's not what I get from the change log. What it says to me > >>>>> is > >>>>> that the caches are currently firmware managed. Barring > >>>>> firmware > >>>>> bugs, that means that we currently don't have any integrity > >>>>> issues. > >>>> Your understanding (or the change log) is incorrect. > >>> There's no current way in English of construing "as firmware takes > >>> care of > >>> flushing cache" to mean its inverse, so the changelog needs > >>> updating to > >>> explain > >>> that firmware does *not* take care of cache flushing, so > >>> effectively > >>> nothing > >>> does and we'll need a cc to stable if the problem is that nothing > >>> takes > >>> care of > >>> flushing the drive caches. > >>> > >>> James > >> Sorry for confusion. More accurate to say - > >> > >> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to > >> SCSI layer without sending it down to firmware as firmware supposed > >> to takes care of flushing cache for all Virtual Disk at the time of > >> system reboot/shutdown. Because of above FW rule driver coded wrongly > >> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD > >> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass > >> the same to the Firmware. ). We figure out that even for VD, why > >> driver should send back SYNC_CACHE command...let's have the same in > >> FW (giving all control in FW) > >> > >> New behavior described - > >> > >> IF application requests SYNCH_CACHE > >> IF any FW which supports new API bit called canHandleSyncCache > >> Driver sends SYNCH_CACHE command to the FW > >> IF 'VirtualDisk' > >> FW does not send SYNCH_CACHE to drives > >> FW returns SUCCESS > >> ELSE IF 'JBOD' > >> FW sends SYNCH_CACHE to drive > >> FW obtains status from drive and returns same status back to > >> driver > >> ENDIF > >> > >> Fixing this is robust if FW and driver changes are inline. See below > >> for more detail. > >> > >> Case - 1 > >> This patch is attempt to fix one level problem where Virtual Disks > >> are not managed correctly in MR FW. There are some MR FW (E.a Gen2 > >> and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not > >> reply correct for SYNCH_CACHE. This was handled in past and carry > >> forward (in driver + FW ) to all new generation products as well. We > >> tried to collect all possible details why it was handled such way to > >> provide better driver fix for this issue(mainly to avoid this FW > >> checks and module parameters etc..). But now it looks like to make > >> generic fix (Device ID based etc..)is even risky, so went with this > >> protective approach. It is very risky to find out which Device ID > >> and FW has capability to manage SYNC_CACHE, so we managed to figure > >> out which are the new FW using FW capability bit. > >> > >> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported > >> command (this is a firmware bug) and immediately it will be failed to > >> SML. This will cause File system to go Read-only. > > That's a better description ... what you're saying is that the patch > > doesn't actually fix the bug Ric is worrying about. > > > >> Case -2 > >> One more thing which we are trying and known driver can send > >> "SYNC_CACHE" unconditionally to all generation of FW. For this we > >> are doing more unit testing on various controllers/FW and planning to > >> provide another patch to fix JBOD path for any FW. (It will not be > >> based on FW capability/module parameter). > > OK, but we really need this ASAP. As Ric said, data integrity is at > > risk until this is fixed. Can you also reference the commit that > > introduced the problem so we know how far to do the sable backports? > > > >> If we remove module parameter, we can ask customer to disable WCE on > >> drive to get similar impact. > > Thanks, > > > > James > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html