On Thu, Aug 27, 2015 at 7:11 AM, <ygardi@xxxxxxxxxxxxxx> wrote: >> On Tue, Aug 25, 2015 at 7:36 AM, <ygardi@xxxxxxxxxxxxxx> wrote: >>>> On Aug 21, 2015 3:10 PM, "Yaniv Gardi" <ygardi@xxxxxxxxxxxxxx> wrote: >>>>> >>>>> Add a write memory barrier to make sure descriptors prepared are >>>>> actually >>>>> written to memory before ringing the doorbell. We have also added the >>>>> write memory barrier after ringing the doorbell register so that >>>>> controller sees the new request immediately. >>>>> >>>>> Signed-off-by: Yaniv Gardi <ygardi@xxxxxxxxxxxxxx> >>>>> >>>>> --- >>>>> drivers/scsi/ufs/ufshcd.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >>>>> index fef0660..876148b 100644 >>>>> --- a/drivers/scsi/ufs/ufshcd.c >>>>> +++ b/drivers/scsi/ufs/ufshcd.c >>>>> @@ -833,6 +833,8 @@ void ufshcd_send_command(struct ufs_hba *hba, >>>>> unsigned int task_tag) >>>>> ufshcd_clk_scaling_start_busy(hba); >>>>> __set_bit(task_tag, &hba->outstanding_reqs); >>>>> ufshcd_writel(hba, 1 << task_tag, >>>>> REG_UTP_TRANSFER_REQ_DOOR_BELL); >>>>> + /* Make sure that doorbell is committed immediately */ >>>>> + wmb(); >>>> >>>> Is this really necessary? Is there a measurable difference? >>> >>> I'm not sure if there is a measurable difference, but as the Door-Bell >>> register is the one that actually responsible for the HW execution of >>> the >>> requests, anyhow, it's recommended to its value will be written >>> instantly to the memory. >> >> A barrier doesn't guarantee speed, only ordering. Unless you can >> measure the difference, you should not have it. > > Rob, > let me have an example: > context#1 updates outstanding_reqs variable and write(DOOR_BELL) > context#2 upon interrupt of a request completion the following happens: > report completion on each one of the bits in: > outstanding_reqs ^ read(DOOR_BELL); > > 0. let's assume the DOOR_BELL = 0x1 (which means 1 active request in slot 0) > 1. context#1: update the DOOR_BELL to be 0x3; (2 active requests: in slot > 0 and 1) > 2. the new value 0x3 is still not written to the DR so DORR_BELL is still > 0x1, but outstanding_reqs is already updated = 0x3 > 3. the request in slot 0 just completed, and interrupt happens, so > DORR_BELL is now 0 (request in slot 0 completed) > 4. context#2: outstanding_reqs ^ read(DOOR_BELL) = 0x3 ^ 0x0 = 0x3 => > wrong conclusion since the request in slot 1 never completed, and actually > never started. Barriers alone will never solve this problem. They may narrow the window possibly, but the problem is still there. What you have to have is a spinlock around all accesses to both outstanding_reqs and doorbell register. And guess what, spinlocks have appropriate barriers to ensure visibility of what they protect. Or perhaps the h/w provides another way to signal what slots have completed. Using the same register for doorbell and completion status is not ideal. Rob -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html