On 08/01/2014 06:40 PM, Webb Scales wrote: > On 7/31/14 9:56 AM, Tomas Henzl wrote: >>>> In cmd_tagged_alloc "Thus, there should never be a collision here between >>>> two requests" if this is true you don't need the refcount and just a simple >>>> flag were sufficient for your other needs. (And maybe you get to ~971k IOPS..) >>> :-) The code previously had the reference count in there to close >>> certain race conditions' asynchronous accesses to the command block >>> (e.g., between an abort and a completing command). We hope that, using >>> the block layer tags, those races no longer occur, but there didn't seem >>> to be any point in removing the reference count until we'd had more >>> experience with the code...and, in a fit of healthy paranoia I added the >>> check to which you refer. Unfortunately, in some of Rob Elliott's >>> recent tests, he managed to drive a case where the check triggers. So, >>> until we reconcile that, I'm inclined to leave the check in place >>> (hopefully it's not costing a full 1k IOPS :-) ). >> Let us assume that the block layer never sends duplicate tags to the driver, >> I think there are some weaknesses in the way how it's implemented, >> for example here - from fail_all_outstanding_cmds: >> refcount = atomic_inc_return(&c->refcount); >> if (refcount > 1) { >> ... >> finish_cmd(c); - this finishes the command >> and the tag might be now reused, >> when that happens you'll see a race >> atomic_dec(&h->commands_outstanding); >> failcount++; >> } >> else {what happens when right now comes a call from block layer?} > The next call from the block layer should find that the controller has > been marked as locked-up and return immediately with a no-connect error > (even before trying to allocate an HPSA command block), so I don't think > there is a race or conflict. OK, the 'locked-up' logic and when it's activated - I mean I don't understand every part of your driver. I have just seen that in your current implementation of cmd_tagged_alloc that you only print a debug message and continue with the same block. That might bring some issues later. > > >> When references are used it usually means, that when refcount==0 that >> a release function frees the resources, in this case it is the tag number. >> In your implementation you should ensure that when you signal >> to the upper layer that the tag is free, that nothing else holds the reference. > Actually, in this case, the resource is the HPSA command block which > corresponds to the tag number. But, you are correct that trying to > manage them separately is apt to cause problems. Having the > cmd_[tagged_]free() routine make the call to scsi_done() -- only when > the reference count is dropping to zero -- is an interesting > suggestion...but I'll have to do considerable investigation before I can > proceed with that. I'm not sure if this is doable (calling scsi_done when refcount drops to zero) I just see a weakness of the code that you can have called the scsi_done and still having a refcount > 0. There is work in progress on the code and I have found no more issues (better said maybe issues), so thanks for sharing the code now. --tomash > > >> If this is true the conclusion is that you don't need this kind of references >> and a simple flag just to debug not yet fixed races is enough. >> I'm maybe wrong because I don't understand what you want to protect >> in the above example, so that makes me to have some doubts if I understand >> the code properly. > Absent the block layer support, the HPSA code has to be able to defend > against various races where there may be more than two interested > parties looking at the command block -- so a simple flag will not > suffice. On the other hand, with the block layer support in place, > we're hoping that we can get rid of the reference count altogether, but > we're not there, yet. > > > Webb > > > >>>> On 07/25/2014 09:28 PM, scameron@xxxxxxxxxxxxxxxxxx wrote: >>>>> hpsa: Work In Progress: "lockless monster" patches >>>>> >>>>> To be clear, I am not suggesting that these patches be merged at this time. >>>>> >>>>> This patchset is vs. Christoph Hellwig's scsi-mq.4 branch which >>>>> may be found here: git://git.infradead.org/users/hch/scsi.git >>>>> >>>>> We've been working for a long time on a patchset for hpsa to remove >>>>> all the locks from the main i/o path in pursuit of high IOPS. Some >>>>> of that work is already upstream, but a lot more of it is not quite >>>>> yet ready to be merged. However, I think we've "gone dark" for a bit >>>>> too long on this, and even though the patches aren't really ready to >>>>> be merged just yet, I thought I should let other people who might be >>>>> interested have a look anyway, as things are starting to be at least >>>>> more stable than unstable. Be warned though, there are still some >>>>> problems, esp. around error recovery. >>>>> >>>>> That being said, with the right hardware (fast SAS SSDs, recent Smart >>>>> Arrays e.g. P430, with up-to-date firmware, attached to recent disk enclosures) >>>>> with these patches and the scsi-mq patches, it is possible to get around >>>>> ~970k IOPS from a single logical drive on a single controller. >>>>> >>>>> There are about 150 patches in this set. Rather than bomb the list >>>>> with that, here is a link to a tarball of the patches in the form of >>>>> an stgit patch series: >>>>> https://github.com/smcameron/hpsa-lockless-patches-work-in-progress/blob/master/hpsa-lockless-vs-hch-scsi-mq.4-2014-07-25-1415CDT.tar.bz2?raw=true >>>>> >>>>> In some cases, I have probably erred on the side of having too many too >>>>> small patches, on the theory that it is easier to bake a cake than to >>>>> unbake a cake. Before these are submitted "for reals", there will be >>>>> some squashing of patches, along with other cleaning up. >>>>> >>>>> There are some patches in this set which are already upstream in >>>>> James's tree which do not happen to be in Christoph's tree >>>>> (most of which are named "*_sent_to_james"). There are also >>>>> quite a few patches which are strictly for debugging and are not >>>>> ever intended to be merged. >>>>> >>>>> >>>>> Here is a list of all the patches sorted by author: >>>>> >>>>> Arnd Bergmann (1): >>>>> [SCSI] hpsa: fix non-x86 builds >>>>> >>>>> Christoph Hellwig (1): >>>>> reserve block tags in scsi host >>>>> >>>>> Joe Handzik (9): >>>>> hpsa: add masked physical devices into h->dev[] array >>>>> hpsa: add hpsa_bmic_id_physical_device function >>>>> hpsa: get queue depth for physical devices >>>>> hpsa: use ioaccel2 path to submit IOs to physical drives in HBA mode. >>>>> hpsa: Get queue depth from identify physical bmic for physical disks. >>>>> hpsa: add ioaccel sg chaining for the ioaccel2 path >>>>> Set the phys_disk value for a CommandList structure that is >>>>> submitted. Squash >>>>> hpsa: unmap ioaccel2 commands before, not after adding to resubmit >>>>> workqueue >>>>> hpsa: add more ioaccel2 error handling, including underrun statuses. >>>>> >>>>> Nicholas Bellinger (1): >>>>> hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation >>>>> >>>>> Robert Elliott (44): >>>>> Change scsi.c scsi_log_completion() to print strings for QUEUED, >>>>> Set scsi_logging_level to be more verbose to get better messages >>>>> hpsa: do not unconditionally copy sense data >>>>> hpsa: optimize cmd_alloc function by remembering last allocation >>>>> From a154642aeed291c7cfe4b9ea9da932156030f7d1 Mon Sep 17 00:00:00 >>>>> 2001 >>>>> hpsa: Clean up host, channel, target, lun prints >>>>> hpsa: do not check cmd_alloc return value - it cannnot return NULL >>>>> hpsa: return -1 rather than -ENOMEM in hpsa_get_raid_map if fill_cmd >>>>> fails >>>>> hpsa: return -ENOMEM not 1 from ioaccel2_alloc_cmds_and_bft on >>>>> allocation failure >>>>> hpsa: return -ENOMEM not 1 from hpsa_alloc_ioaccel_cmd_and_bft on >>>>> allocation failure >>>>> hpsa: return -ENOMEM not -EFAULT from hpsa_passthru_ioctl on kmalloc >>>>> failure >>>>> hpsa: pass error from pci_set_consistent_dma_mask intact from >>>>> hpsa_message >>>>> hpsa: detect and report failures changing controller transport modes >>>>> hpsa: add hpsa_disable_interrupt_mode >>>>> hpsa: rename free_irqs to hpsa_free_irqs and move before >>>>> hpsa_request_irq >>>>> hpsa: rename hpsa_request_irq to hpsa_request_irqs >>>>> hpsa: on failure of request_irq, free the irqs that we did get >>>>> hpsa: make hpsa_pci_init disable interrupts and pci_disable_device on >>>>> critical failures >>>>> hpsa: fix a string constant broken into two lines >>>>> hpsa: avoid unneccesary calls to resource freeing functions in >>>>> hpsa_init_one >>>>> hpsa: add hpsa_free_cfgtables function to undo what >>>>> hpsa_find_cfgtables does >>>>> hpsa: report failure to ioremap config table >>>>> hpsa: add hpsa_free_pci_init function >>>>> hpsa: clean up error handling in hpsa_pci_init >>>>> hpsa: make hpsa_remove_one use hpsa_pci_free_init >>>>> hpsa: rename hpsa_alloc_ioaccel_cmd_and_bft to >>>>> hpsa_alloc_ioaccel1_cmd_and_bft >>>>> hpsa: rename hpsa_allocate_cmd_pool to hpsa_alloc_cmd_pool >>>>> hpsa: rename ioaccel2_alloc_cmds_and_bft to >>>>> hpsa_alloc_ioaccel2_cmd_and_bft >>>>> hpsa: fix memory leak in hpsa_alloc_cmd_pool >>>>> hpsa: return status not void from hpsa_put_ctlr_into_performant_mode >>>>> hpsa: clean up error handling in hpsa_init_one >>>>> hpsa: report allocation failures while allocating SG chain blocks >>>>> hpsa: rename hpsa_allocate_sg_chain_blocks to >>>>> hpsa_alloc_sg_chain_blocks >>>>> hpsa: improve allocation failure messages from hpsa_init_one >>>>> hpsa: fix minor if-statement style issue in hpsa_init_one >>>>> hpsa: fix wrong indent level in hpsa_init_one >>>>> Add hpsa_free_performant_mode and call it from hpsa_init_one >>>>> hpsa: shorten the wait for the CISS doorbell mode change >>>>> acknowledgement >>>>> hpsa: clean up some error reporting output in abort handler >>>>> hpsa: try to detect controller lockup in abort handler >>>>> hpsa: Return DID_NO_CONNECT rather than DID_ERR on lockup >>>>> hpsa: check for lockup on all simple_cmd submissions >>>>> hpsa: Support 64-bit lun values in printks >>>>> hpsa: do not print ioaccel2 warning messages about unusual >>>>> completions. >>>>> >>>>> Stephen M. Cameron (95): >>>>> scsi: break from queue depth adjusting loops when device found >>>>> hpsa: remove online devices from offline device list >>>>> hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl >>>>> hpsa: make hpsa_init_one return -ENOMEM if allocation of >>>>> h->lockup_detected fails >>>>> hpsa: fix 6-byte READ/WRITE with 0 length data xfer >>>>> hpsa: remove spin lock around command allocation >>>>> hpsa: use atomics for commands_outstanding instead of spin locks >>>>> hpsa: reserve some commands for use by driver >>>>> hpsa: get rid of cmd_special_alloc and cmd_special_free >>>>> hpsa: do not queue commands internally in driver >>>>> hpsa: remove unused interrupt handler code >>>>> hpsa: remove direct lookup bit from command tags >>>>> hpsa: fix race between abort handler and main i/o path >>>>> hpsa: allow lockup detector to be turned off >>>>> hpsa: do not request device rescan on every ioaccel path error >>>>> hpsa: factor out hpsa_ciss_submit function >>>>> hpsa: use workqueue to resubmit failed ioaccel commands >>>>> hpsa: use driver instead of system work queue for command >>>>> resubmission >>>>> hpsa: remove 'action required' phrasing >>>>> hpsa: Count passthru cmds with atomics, not a spin locked int >>>>> hpsa: remove unused fifo_full function >>>>> hpsa: slightly optimize SA5_performant_completed >>>>> hpsa: do not check for msi(x) in interrupt_pending >>>>> hpsa: fix fail_all_outstanding_cmds to use reference counting >>>>> hpsa: fix endianness issue with scatter gather elements >>>>> hpsa: get rid of type/attribute/direction bit field where possible >>>>> hpsa: do not use function pointers in fast path command submission >>>>> hpsa: do not wait for aborted commands to complete after abort >>>>> hpsa: fix hpsa_drain_accel_commands to use reference counting instead >>>>> of bitmap >>>>> hpsa: fix race when updating raid map data. >>>>> hpsa: allow commands to be completed on specified reply queue >>>>> hpsa: fix completion race in abort handler >>>>> hpsa: fix aborting of reused commands >>>>> hpsa: lay groundwork for handling driver initiated command timeouts >>>>> hpsa: do not free command twice in hpsa_get_raid_map >>>>> hpsa: do not send aborts to logical devices that do not support >>>>> aborts >>>>> hpsa: remember whether devices support aborts across rescans >>>>> hpsa: do not send two aborts with swizzled tags. >>>>> hpsa: decrement h->commands_outstanding in fail_all_outstanding_cmds >>>>> hpsa: flush work queue hpsa_wq in fail_all_outstanding_cmds >>>>> hpsa: use per controller not per driver work queue for command >>>>> resubmission >>>>> hpsa: fix allocation sizes for CISS_REPORT_LUNs commands >>>>> hpsa: always use extended report physical LUNs for physical devices >>>>> hpsa: prevent manually adding masked physical devices >>>>> hpsa: fix values for expose status >>>>> hpsa: avoid unnecessary io in hpsa_get_pdisk_of_ioaccel2() >>>>> hpsa: do not be so noisy about check conditions >>>>> hpsa: handle descriptor format sense data where needed >>>>> hpsa: print CDBs instead of kernel virtual addresses for uncommon >>>>> errors >>>>> hpsa: handle Command Status TMF function status >>>>> hpsa: do not use a void pointer for scsi_cmd field of struct >>>>> CommandList >>>>> hpsa: return FAILED from abort handler if controller locked up. >>>>> hpsa: if controller locked up return failed from device reset handler >>>>> hpsa: check for ctlr lockup after command allocation in main io path >>>>> hpsa: mark masked devices as masked in output >>>>> hpsa: limit number of concurrent abort requests >>>>> hpsa: separate lockup detection and device rescanning >>>>> hpsa: factor out hpsa_init_cmd function >>>>> hpsa: reinitialize commands on resubmitting failed ioaccel commands >>>>> hpsa: fully initialize commands once at driver load not every time >>>>> hpsa: change driver version to 3.4.6 >>>>> hpsa: allow lockup detected to be viewed via sysfs >>>>> hpsa: do not ignore return value of hpsa_register_scsi >>>>> hpsa: DEBUG ONLY - allow triggering command resubmitting via sysfs >>>>> hpsa: try resubmitting down raid path on task set full >>>>> hpsa: factor out hpsa_ioaccel_submit function >>>>> hpsa: try resubmitting down ioaccelerated path on task set full >>>>> hpsa: honor queue depth of physical devices >>>>> hpsa: use cancel_delayed_work_sync where needed >>>>> hpsa: clean up queue depth getting code >>>>> hpsa: try to fix ioaccel command accounting code >>>>> hpsa: fix hpsa_figure_phys_disk_ptrs to handle layout_map_count other >>>>> than 1 >>>>> hpsa: close race in ioaccel_cmds_out accounting >>>>> hpsa: guard against overflowing raid map array >>>>> hpsa: test ioaccel_cmds_out vs. queue depth earlier to avoid wasting >>>>> time >>>>> hpsa: DEBUG ONLY - fix missing atomic-dec of ioaccel_cmds_out in >>>>> debug command resubmission code >>>>> hpsa: fix missing atomic_dec of dev->ioaccel_cmds_out >>>>> hpsa: do not ack controller events on controllers that do not support >>>>> it >>>>> hpsa: fix code that updates h->dev[]->phys_disk[] >>>>> hpsa: remove bug-on that triggers in hba mode by mistake >>>>> hpsa: remove unused temp64 variable from hpsa_cmd_init >>>>> hpsa bump driver version again >>>>> hpsa break hpsa_free_irqs_and_disable_msix into two functions >>>>> separate hpsa_free_cmd_pool into three separate function >>>>> hpsa: fix missing return in hpsa_command_resubmit_worker >>>>> hpsa: fix command leaks in hpsa_resubmit_command_worker >>>>> hpsa: fix bad interpretations of hpsa_ioaccel_submit return value >>>>> hpsa: do not touch phys_disk[] for ioaccel enabled logical drives >>>>> during rescan >>>>> hpsa: add change_queue_type function >>>>> hpsa: make hpsa_change_queue_depth handle reason SCSI_QDEPTH_QFULL >>>>> hpsa: initialize queue depth of logical drives reasonably >>>>> hpsa: attempt to fix queue depth calculations for logical drives >>>>> hpsa: add support sending aborts to physical devices via the ioaccel2 >>>>> path >>>>> hpsa: fix problem with abort support checking happening too early >>>>> hpsa: remove incorrect bug on in hpsa_scsi_ioaccel_raid_map >>>>> >>>>> Webb Scales (3): >>>>> Initialize reference counts to zero on initial command entry >>>>> allocation >>>>> hpsa: make hpsa_send_reset_as_abort_ioaccel2() use the specified >>>>> reply queue. >>>>> hpsa: use block layer tag for command allocation >>>>> >>>>> drivers/scsi/hpsa.c | 3642 ++++++++++++++++++++++++++++----------------- >>>>> drivers/scsi/hpsa.h | 103 +- >>>>> drivers/scsi/hpsa_cmd.h | 222 +++- >>>>> drivers/scsi/scsi.c | 64 +- >>>>> drivers/scsi/scsi_error.c | 2 + >>>>> drivers/scsi/scsi_lib.c | 1 + >>>>> include/scsi/scsi_host.h | 9 + >>>>> 7 files changed, 2561 insertions(+), 1482 deletions(-) >>>>> >>>>> -- steve > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html