Re: [for-next 1/5] RDMA/bnxt_re: Enable RoCE on virtual functions

Devesh Sharma <devesh.sharma@xxxxxxxxxxxx> · Tue, 9 Jan 2018 22:18:52 +0530

On Tue, Jan 9, 2018 at 8:36 PM, Doug Ledford <dledford@xxxxxxxxxx> wrote:
> On Tue, 2018-01-09 at 19:37 +0530, Devesh Sharma wrote:
>> On Tue, Jan 9, 2018 at 3:42 AM, Doug Ledford <dledford@xxxxxxxxxx> wrote:
>> > On Fri, 2018-01-05 at 06:40 -0500, Devesh Sharma wrote:
>> > > From: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>
>> > >
>> > > Currently, fifty percent of the total available resources
>> > > are reserved for PF and remaining are equally divided among
>> > > active VFs.
>> > >
>> > > +/*
>> > > + * Percentage of resources of each type reserved for PF.
>> > > + * Remaining resources are divided equally among VFs.
>> > > + * [0, 100]
>> > > + */
>> > > +#define BNXT_RE_PCT_RSVD_FOR_PF         50
>> >
>> > This is a separate comment from the patch review itself.  But, are you
>> > sure this is a good idea?  And especially are you sure that it should be
>> > a compile time constant and not a runtime parameter?
>> >
>>
>> Keeping a compile time constant is indeed not a good idea and I completely
>> understand that if we have had a knob there it would had been much much
>> better and flexible.
>> For this submission we wanted to avoid the use of module-parameter or configfs
>> interface. Thus, as a workaround this is hard-coded compile time
>> constant is used.
>> Eventually, more flexible scheme would be supplied to change this.
>
> Ok.
>
>> > I ask because it seems to me that usage of this stuff falls into one of
>> > two categories:
>> >
>> > 1) All bare metal usage
>> > 2) SRIOV usage (in which case the bare metal OS does relatively little,
>> > the SRIOV using clients do most of the work)
>> >
>> > I guess I'm finding it hard to imagine a scenario where, when you do
>> > have SRIOV VFs, that you don't want the majority of all resources being
>> > used there.
>> >
>> > I might suggest that you simply don't split resources at all.  Maybe do
>> > something like filesystems do.  Let anyone at all take a resource until
>> > you hit 95% utilization then only root can write to the filesystem.  In
>> > this case it would be let both PFs and VFs use resources at will up
>> > until you hit the 95% utilization threshold and then restrict resource
>> > use to the PF.  That would make much more sense to me.
>>
>> This is indeed an excellent suggestion to optimize the resource
>> utilization between
>> PFs and VFs, however, I have couple of facts to put forward
>>
>> - If I have understood it correctly then this would require an
>> independent entity which
>>   would keep track of what percentage of resources has been utilized
>> at any given
>>   point in time by all the functions (PF and its VFs). Currently, we
>> do not have such
>>   implementation in firmware and PF driver cannot track or resource
>> utilization across
>>   functions.
>
> Fair enough.
>
>> - In the current implementation hard-coding 50% does not hard-limit PFs with 50%
>>   it can still over-subscribe upto max limit even though max VFs are active.
>
> OK, but that means a PF can starve VFs rather easily I take it?
Yeah it could, however for now this 50% means 64K are resources are there PF,
so kind of less worried. For some deployments 64K may not be
sufficient, VFs could
starve in such deployments.

>
>> - With the equal distribution of remaining resources among VFs we are
>> trying to avail
>>   minimum guaranteed resources to max possible VFs on a given PF. We
>> want to avoid
>>   the case where number of usable VFs depend on the current usage of
>> resources consumed
>>   by already active VFs.
>
> And this then is the opposite of the PF in that VFs aren't *really*
> guaranteed this minimum amount, since the PF can starve the VFs out, but
> it at least guarantees other VFs don't starve any specific VF out.

Yes, true, I should rather re-phrase the previous bullet. It prevents
VF starvation.

>
> That's fine if that's how you want things setup for now.  I think I
> would work on a firmware update to implement the resource tracker as the
> long term solution ;-).

I would take this feedback to the concerned people in our f/w team and see
if this is possible to implement such an entity.
>
> --
> Doug Ledford <dledford@xxxxxxxxxx>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html