On Mon, Aug 6, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote: > On Sat, 2018-08-04 at 18:56 +0530, Sreekanth Reddy wrote: >> No Bart, their is no race condition here. Since chain lookup table >> entry is uniquely accessed using smid value. And this smid (which is >> scmd->request->tag +1) is unique for each IO request. And >> _base_get_chain_buffer_tracker() function is called only during IO >> submission time (i.e. before submitting the IO to the IOC firmware) >> and mpt3sas_base_clear_st() function is called in the ISR (i.e during >> IO completion) path. So for a particular IO these two functions called >> at two different instances of time. > > Hello Sreekanth, > > In mpt3sas_base_get_smid_scsiio() I found the following code: > > struct scsiio_tracker *request = scsi_cmd_priv(scmd); > u16 smid = scmd->request->tag + 1; > request->smid = smid; > > That confirms what you wrote about smid being unique for each I/O request. > However, this also raises new questions. As far as I can see all code in > the I/O path that accesses the chain_lookup[] array uses index smid - 1. Our IOC firmware always requires smid to be >=1, zero is illegal smid for the IOC firmware. Hence while framing MPT SCSI IO request driver always make sure that smid to be >=1, so it uses smid as tag + 1. where as chain_lookup[] starts with index zero, So driver is accessing this chain_lookup[] with smid-1 index. > The patch at the start of this e-mail thread modifies two functions, namely > mpt3sas_remove_dead_ioc_func() and mpt3sas_base_clear_st(). This patch modifies in two functions namely '_base_get_chain_buffer_tracker()' & 'mpt3sas_base_clear_st()' not in the mpt3sas_remove_dead_ioc_func(), not sure why 'git format-patch' is showing 'mpt3sas_remove_dead_ioc_func' function name in the patch. This function _base_get_chain_buffer_tracker() is called only during the IO submission time and mpt3sas_base_clear_st() is called during IO completion time and hence I don't see any race condition here. Currently this patch is very much needed, so please consider this patch. May be we can start new email thread to discuss on the possible race conditions you are observing and I can clear your quires or I can fix them as on we can find the real race conditions. Thanks, Sreekanth Since the > chain_lookup[] array is indexed with the smid and hence contains request- > private data, how is it even possible that these functions race against > each other? Is there perhaps code that accesses the chain_lookup[] array > and that is called from another context than the I/O completion path? > > Is the patch at the start of this e-mail thread the result of a theoretical > concern or of a real failure? In the latter case, which sequence of actions > triggered the failure? The answer to this question should already have been > mentioned in the description of v1 of this patch. > > Thanks, > > Bart.