Re: [PATCH rdma-core 1/1] mlx5: Account for at least 1 medium bfreg in low_lat_uuars check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 30, 2018 at 4:14 AM, Yishai Hadas
<yishaih@xxxxxxxxxxxxxxxxxx> wrote:
> On 4/29/2018 8:34 PM, Rohit Zambre wrote:
>>
>> On Sun, Apr 29, 2018 at 11:12 AM, Yishai Hadas
>> <yishaih@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>>
>>> On 4/23/2018 8:58 PM, Rohit Zambre wrote:
>>>>
>>>>
>>>> The calculation of need_uuar_lock rightly accounts for at least 1 medium
>>>> bfreg.
>>>
>>>
>>>
>>> The code should support even 0 medium bfregs comparing the suggested
>>> patch below.
>>
>>
>> So when the user sets MLX5_TOTAL_UUARS=16 and
>> MLX5_NUM_LOW_LAT_UUARS=15, the first 15 QPs will map to the 15 low
>> latency bfregs. To which bfreg will the 16th QP map to?
>
>
> The 16th QP expects to get bfreg index 0 which is returned from the kernel
> once there are *no* low latency high class nor medium bfregs available. In
> this case DB is used in rdma-core instead of BF and no lock is needed. (see
> usage of bfs[bfi].buf_size)

OK, so all the QPs after the 15th one will map to bfreg_0. I am trying
to understand why a lock would not be needed on bfreg_0. I know that
the DB is a 64-bit atomic. Is there an implicit flush/sfence with the
atomic? When CPU_A posts on the 16th QP and CPU_B posts on the 17th QP
simultaneously, without the lock, what is guaranteeing that the device
will see both the DBs?

> However, looking in the kernel code, it seems that there is some bug in
> 'first_med_bfreg()' which had to handle this non-default case when there are
> no medium bfregs.
>
> Below patch expects to fix that, can you please give it a try and confirm
> the fix ? this should come in parallel with the candidate patch that I have
> sent yesterday in rdma-core in need_uuar_lock().

Both the rdma-core and kernel patch will fix the problem, given that
the device will see all the doorbells written simultaneously to
bfreg_0 without a lock.
In the rdma-core patch, mapping the uuarn index to i is quite specific
to the values defined as MLX5 enums. I would do it the following way
but the final decision is up to you.

 static int need_uuar_lock(struct mlx5_context *ctx, int uuarn)
 {
+       int i;
+
        if (uuarn == 0 || mlx5_single_threaded)
                return 0;

-       if (uuarn >= (ctx->tot_uuars - ctx->low_lat_uuars) * 2)
+       i = (uuarn / NUM_BFREGS_PER_UAR *
MLX5_NUM_NON_FP_BFREGS_PER_UAR) + (uuarn % NUM_BFREGS_PER_UAR);
+       if (i >= ctx->tot_uuars - ctx->low_lat_uuars)
                return 0;

Thanks,
Rohit
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux