Re: [PATCH 02/10] IB/uverbs: Build the specs into a radix tree at runtime

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Mon, 13 Aug 2018 10:03:10 -0600

On Mon, Aug 13, 2018 at 08:52:35AM -0600, Guy Levi(SW) wrote:
> Well, better late than never...
> SB my comments :)
> 
> > From: Jason Gunthorpe [mailto:jgg@xxxxxxxx]
> > Sent: Friday, August 03, 2018 10:32 PM
> > To: linux-rdma@xxxxxxxxxxxxxxx; Leon Romanovsky <leonro@xxxxxxxxxxxx>; Guy Levi(SW) <guyle@xxxxxxxxxxxx>; Yishai Hadas
> > <yishaih@xxxxxxxxxxxx>; Ruhl, Michael J <michael.j.ruhl@xxxxxxxxx>
> > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > Subject: [PATCH 02/10] IB/uverbs: Build the specs into a radix tree at runtime
> > 
> > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > 
> > This radix tree datastructure is intended to replace the 'hash' structure
> > used today for parsing ioctl methods during system calls. This first
> > commit introduces the structure and builds it from the existing .rodata
> > descriptions.
> > 
> > The so-called hash arrangement is actually a 5 level open coded radix tree.
> > This new version uses a 3 level radix tree built using the radix tree
> > library.
> > 
> > Overall this is much less code and much easier to build as the radix tree
> > API allows for dynamic modification during the building. There is a small
> > memory penalty to pay for this, but since the radix tree is allocated on
> > a per device basis, a few kb of RAM seems immaterial considering the
> > gained simplicity.
> > 
> > The radix tree is similar to the existing tree, but also has a 'attr_bkey'
> 
> Isn't the attr_bkey concept the same as the bitmask context was
> before (you refer it as a new concept)?

Not really. The prior code used bitmaps, but they were placed inside a
'radix tree' (called hash), so there were mutliple bitmaps to consult.

In this implementation any valid attribute number can be directly
converted into a single bitmap key.

> > +/*
> > + * This is the runtime description of the uverbs API, used by the syscall
> > + * machinery to validate and dispatch calls.
> > + */
> > +
> > +/*
> > + * Depending on ID the slot pointer in the radix tree points at one of these
> > + * structs.
> > + */
> > +struct uverbs_api_object {
> > +	const struct uverbs_obj_type *type_attrs;
> > +	const struct uverbs_obj_type_class *type_class;
> 
> Why is it critical to duplicate the type_class (it is already inside type_attrs)?

type_attrs is rodata memory inside the driver module, and type_class
is rodata memory inside the uverbs module.

uverbs require's type_class after we unload the driver module, so it
must be copied here as we cannot access it via type_attrs after driver
module unload.

> > +	method_elm = uapi_add_elm(uapi, method_key, sizeof(*method_elm));
> > +	if (IS_ERR(method_elm)) {
> > +		if (method_elm != ERR_PTR(-EEXIST))
> > +			return PTR_ERR(method_elm);
> > +
> > +		/*
> > +		 * This occurs when a driver uses ADD_UVERBS_ATTRIBUTES_SIMPLE
> > +		 */
> > +		if (WARN_ON(method->handler))
> > +			return -EINVAL;
> > +		method_elm = radix_tree_lookup(&uapi->radix, method_key);
> > +		if (WARN_ON(!method_elm))
> > +			return -EINVAL;
> 
> Why do you need to lookup? Isn't it implicit from EEXIST check?
> Anyway, don't you want to return an error in any case?

No, if the method already exists the code does the 'merge' flow, where
the method description under processing is added to the existing
method.

uapi_add_elm returns EEXISTS, but does not give the method_elm
pointer, which is necessary to revise the method..

> > +	} else {
> > +		WARN_ON(!method->handler);
> > +		method_elm->handler = method->handler;
> > +		if (method->handler != uverbs_destroy_def_handler)
> > +			method_elm->driver_method = is_driver;
> > +	}
> > +
> > +	for (i = 0; i != method->num_attrs; i++) {
> 
> Why do you use '!=' instead of '<' ?

I always use != when iterating over a fixed range, the is a common
idiom, and I find it clearer compared to <, which I reserve for
'tricky' iteration cases.

> > +		const struct uverbs_attr_def *attr = (*method->attrs)[i];
> > +		struct uverbs_api_attr *attr_slot;
> > +
> > +		if (!attr)
> > +			continue;
> > +
> > +		/*
> > +		 * ENUM_IN contains the 'ids' pointer to the driver's .rodata,
> 
> Can you specify what kind of data in driver is included under .rodata?
> Can you elaborate about the comment? I dint understand your point...

It says: the 'ids' member of struct uverbs_attr_def -> uverbs_attr_spec -> u2:

        /* This weird split of the enum lets us remove some padding */
        union {
                struct {
                        /*
                         * The enum attribute can select one of the attributes
                         * contained in the ids array. Currently only PTR_IN
                         * attributes are supported in the ids array.
                         */
                        const struct uverbs_attr_spec *ids;
                } enum_def;
        } u2;

> > +		 * so if it is specified by a driver then it always makes this
> > +		 * into a driver method.
> > +		 */
> > +		if (attr->attr.type == UVERBS_ATTR_TYPE_ENUM_IN)
> > +			method_elm->driver_method |= is_driver;
> 
> Couldn't above be a simple assignment (w/o the 'or')?

No, refer to the explanation I gave to Michael last week

> > +
> > +		attr_slot =
> > +			uapi_add_elm(uapi, method_key | uapi_key_attr(attr->id),
> > +				     sizeof(*attr_slot));
> > +		/* Attributes are not allowed to be modified by drivers */
> 
> Why is this comment located here?

It is explaining what the condition for hitting the below IS_ERR would
be - a driver attempting to override a core attribute is the most
likely situation.

> > +		if (IS_ERR(attr_slot))
> > +			return PTR_ERR(attr_slot);
> > +
> > +		attr_slot->spec = attr->attr;
> > +	}

> > +	for (i = 0; i != tree->num_objects; i++) {
> > +		const struct uverbs_object_def *obj = (*tree->objects)[i];
> > +		struct uverbs_api_object *obj_elm;
> > +		u32 obj_key;
> > +
> > +		if (!obj)
> > +			continue;
> > +
> > +
> > +		obj_key = uapi_key_obj(obj->id);
> > +		obj_elm = uapi_add_elm(uapi, obj_key, sizeof(*obj_elm));
> 
> Doesn't the radix_tree_insert may change the previous tree structure
> for optimizations which will create more than 3 tree levels?
> I.e. insert object key '1100' creates a node '1100'. Then, insert
> object key '1101' may modify the tree so it will have now one parent
> '11' and 2 children '00' and '01' (So it is actually 2 levels to
> store the objects).  If yes, does your design take it into account
> (you remark in the header file that this is 3 level tree)?

It actually doesn't matter to this code what the internal construction
of the tree is.

However, linux radix trees are always minimum height, so as long as
they are 64 entries per chunk it is not possible to exceed 3 levels.

> > +		if (IS_ERR(obj_elm)) {
> > +			if (obj_elm != ERR_PTR(-EEXIST))
> > +				return PTR_ERR(obj_elm);
> > +
> > +			/* This occurs when a driver uses ADD_UVERBS_METHODS */
> > +			if (WARN_ON(obj->type_attrs))
> 
> Can you elaborate what this check for?

When a driver wants to add methods to a core object it is not allowed
to re-specify the type_attrs, they must be left as NULL

> > +				return -EINVAL;
> > +			obj_elm = radix_tree_lookup(&uapi->radix, obj_key);
> > +			if (WARN_ON(!obj_elm))
> 
> Isn't it implicit w/o this lookup? The code checks return error -EEXIST first.
> Anyway, I guess you want to final with 'return' regardless the 'if' result...

Same as above for method_elm, we need the obj_elm to modify it to
include the changes from the driver tree.

> > +				return -EINVAL;
> > +		} else {
> > +			obj_elm->type_attrs = obj->type_attrs;
> > +			if (obj->type_attrs) {
> 
> Can be an obj w/o type_attrs? 
> Does the DSL let it?
> How would it be used and for what purpose?

This flow is used by the DEVICE object, and other objects that cannot
be created. The DSL looks like this:

DECLARE_UVERBS_GLOBAL_METHODS(UVERBS_OBJECT_DEVICE);

> > +				obj_elm->type_class =
> > +					obj->type_attrs->type_class;
> > +				/*
> > +				 * Today drivers are only permitted to use
> > +				 * idr_class types. They cannot use FD types
> > +				 * because we currently have no way to revoke
> > +				 * the fops pointer after device
> > +				 * disassociation.
> > +				 */
> 
> Can't it forced by compilation in the spec declaration?

Maybe with lots of complication.. I think this is good enough to
prevent this mis-use from existing as all drivers have to be tested
and this is guarenteed to blow up.

> > +				if (WARN_ON(is_driver &&
> > +					    obj->type_attrs->type_class !=
> > +						    &uverbs_idr_class))
> > +					return -EINVAL;
> > +			}
> > +		}
> > +
> > +		if (!obj->methods)
> > +			continue;
> 
> How can be an object w/o methods?
> Do you want to insert such object in the tree?
> Shouldn't it return an error?

Currently we have lots of object defined with no ioctl methods because
the kabi is not complete. The methods exist on the write side.

We still have to create the infrastructure though as we now require
the struct uverbs_obj_type to exist for all object. So yes, we must
inster the empty object in to the tree.

> > +static int
> > +uapi_finalize_ioctl_method(struct uverbs_api *uapi,
> > +			   struct uverbs_api_ioctl_method *method_elm,
> > +			   u32 method_key)
> > +{
> > +	struct radix_tree_iter iter;
> > +	unsigned int max_bkey = 0;
> > +	bool single_uobj = false;
> > +	void __rcu **slot;
> > +
> > +	method_elm->destroy_bkey = UVERBS_API_ATTR_BKEY_LEN;
> > +	radix_tree_for_each_slot (slot, &uapi->radix, &iter,
> > +				  uapi_key_attrs_start(method_key)) {
> 
> I don't understand how it goes over all method attributes if you
> give specific attribute key... can you explain how it works?
> Intuitively, I would say that you should give the method's key so it
> go over all attributes (children)...

uapi_key_attrs_start() takes the method key and computes the first
radix tree index for the method's associated attributes.

The attributes are organized in a linear block of radix tree keys
within the range

[uapi_key_attrs_start(), method_key | UVERBS_API_ATTR_KEY_MASK]

> > +		struct uverbs_api_attr *elm =
> > +			rcu_dereference_protected(*slot, true);
> > +		u32 attr_key = iter.index & UVERBS_API_ATTR_KEY_MASK;
> 
> Will be nice to have a comment what is the bkey represent.
> e.g.  "A mask which each bit represents match attribute's key"

That comment is here:

/*
 * This returns a value in the range [0 to UVERBS_API_ATTR_BKEY_LEN),
 * basically it undoes the reservation of 0 in the ID numbering. attr_key
 * must already be masked with UVERBS_API_ATTR_KEY_MASK, or be the output of
 * uapi_key_attr().
 */
static inline __attribute_const__ u32 uapi_bkey_attr(u32 attr_key)

> > +static int uapi_finalize(struct uverbs_api *uapi)
> > +{
> > +	struct radix_tree_iter iter;
> > +	void __rcu **slot;
> > +	int rc;
> > +
> > +	radix_tree_for_each_slot (slot, &uapi->radix, &iter, 0) {
> 
> A redundant space.

The linux coding style has ' ' after for, eg

   for (i = 0; ....) {

Since radix_tree_for_each_slot is a 'for' loop not a function call, it
is correct to place the space, despite checkpatch's incorrect
complaining. clang-format gets this right automatically.

> > +struct uverbs_api *uverbs_alloc_api(
> > +	const struct uverbs_object_tree_def *const *driver_specs,
> > +	enum rdma_driver_id driver_id)
> > +{
> > +	struct uverbs_api *uapi;
> > +	int rc;
> > +
> > +	uapi = kzalloc(sizeof(*uapi), GFP_KERNEL);
> > +	if (!uapi)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	INIT_RADIX_TREE(&uapi->radix, GFP_KERNEL);
> > +	uapi->driver_id = driver_id;
> > +
> > +	rc = uapi_merge_tree(uapi, uverbs_default_get_objects(), false);
> > +	if (rc)
> > +		goto err;
> > +
> > +	for (; *driver_specs; driver_specs++) {
> > +		rc = uapi_merge_tree(uapi, *driver_specs, true);
> > +		if (rc)
> > +			goto err;
> > +	}
> 
> Just to make it clear (?): 
> a. the generic API which is not supported by driver will be blocked inside the driver handler
> b. we abandoned the vision which the parsing tree will imply the driver capabilities.

No, that hasn't changed. The next step is to include more control over
the parsing tree, ie the future flow will basically be

- Build up a radix tree that has 100% of all possiblities
- Prune possibilites that do not match current capabilities

Today it is just a confusing mess with checks sprinkled all over the
place.

> > +/*
> > + * Information about the API is loaded into a radix tree. For IOCTL we start
> > + * with a tuple of:
> > + *  object_id, attr_id, method_id
> > + *
> > + * Which is a 48 bit value, with most of the bits guaranteed to be zero. Based
> > + * on the current kernel support this is compressed into 16 bit key for the
> > + * radix tree. Since this compression is entirely internal to the kernel the
> > + * below limits can be revised if the kernel gains additional data.
> 
> Does the code have any protection (by comments or compilation time
> warning) against adding additional specs which exceeds the radix
> capabilities?  AFAIU, the IDs are picked w/o a reference to any of
> this radix limitation.

A compile time warning is a bit hard to create, but it will fail at
run time while building the uapi, as packing a too big ID in any
situation cases UVERBS_API_KEY_ERR to be returned and uapi_add_elm
always fails with EOVERLFOW in that case..

The code is carefully organized so that the uapi_add_elm is always
done before using an ID for the first time.

Also, the radix tree contruction OR's the object, method, and attrs
keys so if any layer returns UVERBS_API_KEY_ERR then the OR will also
equal UVERBS_API_KEY_ERR, and that is what is checked in
uapi_add_elm()

In other cases, like lookup, the same is true. The radix tree will
search on UVERBS_API_KEY_ERR and immediately fail. No particularly
special checking is needed.

> > + *
> > + * With 64 leafs per node this is a 3 level radix tree.
> > + *
> > + * The tree encodes multiple types, and uses a scheme where OBJ_ID,0,0 returns
> > + * the object slot, and OBJ_ID,METH_ID,0 and returns the method slot.
> > + */
> > +enum uapi_radix_data {
> > +	UVERBS_API_NS_FLAG = 1U << UVERBS_ID_NS_SHIFT,
> 
> Why do you define it? Why 'UVERBS_UDATA_DRIVER_DATA_FLAG' isn't appropriate?
> Anyway, will be more explicit to name it ' UVERBS_API_DRIVER_NS_FLAG'.

Ah, that would probably be better, this is one of those enduring
messes it seems as we have the << version all over the place.

Let's make a folloup patch to drop all uses of UVERBS_ID_NS_SHIFT tree
wide..

Jason