>> RFC: Verbs Counters >> >> >> There is a constant demand to know about connections used in verbs (and and other aspects). >> Some vendors have been offering hardware counters for a long time by using sysfs. >> Those counters, however - are not available per connection but for the whole system. >> >> One way to do it is for each vendor to offer their vendor specific counters, which will probably >> not be generic since each vendor could have their own implementation of counters hence >> the verbs interface will not be generic for the rest of them. >> >> >> We present a generic interface for using counters in verbs. >> Let's have some definition before going into details: >> >> Object: an existing structure in verbs which describes physical entity >> e.g.: QP/FLOW/DEVICE/ >> Counter: a single attribute which is use to count events/statistics >> on object >> Counter-set: a set of counters that belongs to one specific object. >> >> >> A generic interface with following functionality is presented : >> >> >> 1. A way to list of all the available counter-sets in the device. >> Per each counter-set: >> - What do the counters within the set measure? is it QP? Flow? other? >> - A Unique identifier per counter-set. >> - A list of names for all the counters within each counter-set. >> since each vendor has their own counters/stats. Each vendor >> could use their own names for a counter. This suggestion >> aims to replace vendor-specific-api with predefined >> enums/names for each counter/stats. >> - Additional Meta-data about a counter-set (for example - is it cached?) >> >> 2. Operations available per counter-set: >> 2.1 Bind and unbind: >> a counter-set has to be attached to an object in order for any >> counter within a counter-set to count. the attaching action >> is referred to as 'bind' and the opposite action is referred to >> as 'unbind'. >> Rather than having specific generic operation for bind and unbind >> I choose to use existing verbs methods. The existing methods could >> be modified with small changes (like adding a new flag) to bind >> (or unbind) a counter-set to an object. >> >> >> 2.2 Counter-set may be created: counter-set instance is allocated and >> created on a ibv context and belongs to that context. >> >> 2.3 Counter-set may be destroyed: counter-set instance is destroyed and >> de-allocated. If counter-set is bonded to object then it is the >> responsibility of the driver either to unbind prior to hardware >> de-allocation or to notify the user that driver is unable to destroy >> a counter-set and it is the user responsibility to unbind prior to >> destruction. >> >> 2.4 Counter-set may be queried: the user supply counter-set instance and >> output address. The hardware queries the counter-set and writes the >> output to the address as an array of uint64_t. Each entry in the >> uint64_t array represents a single counter. >> >> >> The user is expected to query the device on startup, >> find which counter-sets are supported and to which objects >> each counter-set may be bonded. During this scan the user >> also finds out which counters are supported for which object. >> >> Example for a way to list of all the available counter-sets in the device. >> >> >> We modify the method int query_device_ex() by adding a new flag to the >> enum ibv_device_attr_mask: >> >> >> + IBV_DEVICE_ATTR_COUNTER_SET = 1 << 1 >> >> >> When using this flag, the device will response with struct ibv_device_attr_ex >> with a new attribute: >> >> >> + uint64_t max_supported_counter_sets; >> >> >> And then a user can use a new API to get the description for each counter-set. >> number of counter set is specified by a counter-set-id. >> a counter-set id is a number from 0 to max_supported_counter_sets. >> that is - the number returned from the query_device_ex() call. >> >> >> int ibv_query_counter_set_description(struct ibv_context *context, \ >> uint64_t counter_set_id, \ >> struct ibv_counter_set_description * out) >> - return 0 on success >> - return -1 when counter_set_id is invalid. >> >> >> The API writes to out the following structure: >> >> >> struct ibv_counter_set_description { >> // Which type does this set refers to? >> // value is taken from enum ibv_counter_set_counted_type >> uint8_t counted_type; >> // Number of instances of this counter-set available in the hardware >> uint64_t number_of_counter_sets; >> // Attributes of the set (bit mask) >> // value is taken from enum ibv_counter_set_attributes >> uint32_t attributes; >> // number of entries >> uint8_t entries_count; >> // List of entries, >> struct ibv_counter_entry entry[256]; >> } >> >> >> Where: >> struct ibv_counter_entry { >> // name of the entry. last entry contains NULL >> char name[32]; >> } >> =========================== >> >> >> Brief explanation for the fields inside struct ibv_counter_set_description: >> >> >> counted_type - contain id for which this counter_set is related to. >> the id is a value from ibv_counter_set_counted_type (see below) >> Each counter-set relates to a verbs object, which is the verbs object this >> counter-set aim to count (i.e. measure), such as QP or Flow. >> >> >> enum ibv_counter_set_counted_type { >> IBV_COUNTER_IBV_QP = 0, >> IBV_COUNTER_IBV_FLOW, >> ... >> } >> >> number_of_counter_sets - how many counters does this device supports? >> Note that this value can be interpreted in more than one way. Either how many >> counter_sets are currently available or what is the total (max) number of >> counter_sets the device supports. this is seen as the max limit of count-set >> which the process is allowed to create. >> >> >> attributes - special attributes which this counter-set might have >> either in software or hardware. >> For example we can have cached counter-set. Which means that every query >> for that set is read from the cache. Unless a request to read the values from >> the hardware was specially specified. >> >> >> enum ibv_counter_set_attributes { >> // the counter-set value is cached by default >> IBV_COUNTER_ATTR_CACHED = 1 << 1 >> }; >> >> >> entries_count - number of entries in the counter_set >> >> struct ibv_counter_entry entry[256] - an ordered list of counter names >> where the last name in the array is empty (NULL) >> >> ==== >> >> >> Example: >> >> >> >> >> ibv_counter_set_description is a struct to describe other structs. >> for example, we have the following struct: >> >> >> private struct guy_counter { >> uint64_t apples_kg; >> uint64_t apples_count; >> } >> >> >> >> - note that all counters are 64bit entries >> >> The ibv_counter_set_desc looks like this: >> // according to the ibv_counter_set_type enum >> ibv_counter_set_desc.type = IBV_COUNTER_IBV_GUYGUY >> ibv_counter_set_desc.attributes = 0; >> ibv_counter_set_desc.number_of_counter_sets = 1000; >> ibv_counter_set_desc.entry[0].name = "apples [Kg]" >> ibv_counter_set_desc.entry[1].name = "apples [Count]" >> ibv_counter_set_desc.entry[2].name = \0 >> >> ==== >> >> >> How to fill the internal tables? >> >> >> On initialization - driver should query device capabilities to see how many >> counter-set are supported. for each supported >> counter-set the driver will act as following: >> 1. Allocate counter_set_id >> 2. Register counter_set_id with pointer to data structures with list of counters. >> 5. Finally - The driver returns number of counter_sets supported in ibv_qeury_device_ex() >> >> >> >> >> Operations available for each counter-set: >> Each ibv_counter is represented by the following structure: >> >> >> struct ibv_counter_set { >> struct ibv_context *context; >> uint64_t handle; >> } >> >> >> >> >> THE NEW API >> >> >> struct ibv_counter_set* ibv_create_counter_set(struct ibv_context *context, \ >> uint16_t counter_set_id) >> >> >> Method returns struct ibv_counter_set which contains context+handle. >> Actions: Method Allocates memory for struct ibv_counter_set and then calls >> the driver to allocate the actual hardware counter-set. >> If successful method returns pointer to struct ibv_counter_set on the heap >> which contains context+handle. >> If unsuccessful - method returns NULL and set errno accordingly. >> >> >> int ibv_destroy_counter_set(struct ibv_counter_set* counter_set) >> >> >> Methods destroys input counter_set and free the allocated memory. >> Actions: Method attempts to remove hardware counter-set and then input struct >> is released (deleted). In the kernel the code checks if caller is >> allowed to destroy counter_set (by comparing pid) and then released >> hardware-resource. >> >> >> If unsuccessful method returns -1 and set errno accordingly. >> If successful method returns 0. >> >> >> int ibv_query_counter_set(struct ibv_query_counter_set_attr, uint64_t * out) >> >> Method receives query structure and output address, then query the >> hardware and writes output to the uint64_t * out address. >> Actions: Method recives struct ibv_query_counter_set_attr, parse the query >> and then send it to execution in kernel. >> In the kernel the code checks if caller is allowed to query >> the hardware, executes the query and then writes to *out. >> If unsuccessful method returns -1 and set errno accordingly. >> If successful method returns 0 >> >> >> Where: >> >> >> struct ibv_query_counter_set_attr { >> uint32_t comp_mask >> ibv_counter_set *counter_set; >> enum ibv_query_counter_set_attr_params *query_params; >> } >> >> >> enum ibv_query_counter_set_attr_params { >> // force hardware query instead of cached value >> IBV_COUNTER_FORCE_UPDATE = 1 << 1 >> }; >> >> >> >> >> int ibv_query_counter_set_description(struct ibv_context *context, \ >> uint64_t counter_set_id, \ >> struct ibv_counter_set_description * out) >> >> Method writes out a struct ibv_counter_set_description which contains a description >> of a counter-set. >> User should allocate sizeof(struct ibv_counter_set_description) for *out; >> >> >> - return 0 on success >> - return -1 when counter_set_id is invalid. >> >> >> >> >> Example on using counter_sets: >> >> >> >> >> void foo(struct ibv_context *context, int counter_set_id) >> { >> // an array of attributes. this is a container of counter-set values >> uint64_t my_counter[256]; >> >> struct ibv_counter_set_description my_description; >> struct ibv_counter_set* my_counter_set ; >> >> ibv_query_counter_set_description(context, counter_set_id, &my_description); >> >> my_counter_set = ibv_create_counter_set(context,counter_set_id); >> >> >> // let's define the query object >> struct ibv_query_counter_set_attr my_query; >> my_query.comp_mask = 0; >> my_query.counter_set = &my_counter_set; >> my_query.query_params = 0; >> >> // finally - do the query. >> >> if(-1 == ibv_query_counter_set(my_query, my_counter)) { >> printf("query failed") >> } >> else { >> for(int i = 0 ; i < my_description->entries_count ; ++i) { >> printf("name %d = %lu, \ >> my_description->entries[i].name, my_counter[i]); >> } >> } >> } >> >>I like this solution, even though I was aiming for a different problem :-). >> >>Regarding this solution. It allows polling statistics from the driver/hardware >>per demand in a generic way. Generic means in this case - the number, >>the scope and the meaning of the statistics. I understand that it assumes >>that the resources to count these statistics are limited and hence the usage >>should be only after allocation by the "user". >>It isn't clear to me if the user is the application or libibverbs. Anyway, some >>Cleanup should be considered to avoid driver/HW resources being wasted >>due to improper behavior. If the NIC's HW supports this then the implications >>on the performance is little. If it doesn't, then it is up to the vendor to decide >>if to support and to what extent. >>The vendor should be able to decide what scope it allows/supports. >>I'm not sure what is the added value of caching statistics. >> >>I'm not sure how this would relate to "rdmatool - tool for RDMA users" [1]. >>Is this an alternative or parallel solution? Will the code be re-usable? >> >>Regarding what issue was planning to focus on, I will comment in Alex's e-mail. >> >>Thanks, >>Ram >> >>[1] https://www.spinics.net/lists/linux-rdma/msg45250.html Hi Ram, I'm glad to know that you like this solution. However, I believe that we were aiming for the same thing. trying to know more about the QP/Connection. What I offered in this RFC is a generic infrastructure to be utilized by the vendors to implement a solution that suits your needs. as for your concerns, the user is the programmer who is utilizing the newly suggested verbs interface to know more about the system. Cleanup is supported as intergal part of the API (destroy_counter_set) and it is part of the user responsability to free resources. it is up to the vendor to make sure resources are not getting wasted. for example - if a user attached (binds) a counter_set to QP and the program terminates. I expect the vendor to make sure their driver releases hardware resources for both the QP and the attached counter_set. if the vendor fails to release any resources it is resources leak. the took you mentioned in [1] could possibly utilize the interface I suggested in order to supply more information and/or statistics about the system. for examle, the tool you mentioned describes the following actions: [1] Query RDMA device capabilities - capabilities are constant values/attributes, hence counter_set do not fit this option. [2] Query RDMA device status and current open resources - status and least of open resources: I could think of a counter_set that is binded to the ibv_context and bring a with values, such as number of open QPs and other resources. [3] Fetching RDMA statistics - the suggested API fits RDMA statistics very well and is capable of fulfilling these needs. [4] Configure RDMA device - sorry, the API suggested here is 'read-only'. maybe someone else will offer a generic API to configure. however - it is out of the scope of the suggested API. re-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html