RFC: Verbs Counters There is a constant demand to know about connections used in verbs (and and other aspects). Some vendors have been offering hardware counters for a long time by using sysfs. Those counters, however - are not available per connection but for the whole system. One way to do it is for each vendor to offer their vendor specific counters, which will probably not be generic since each vendor could have their own implementation of counters hence the verbs interface will not be generic for the rest of them. We present a generic interface for using counters in verbs. Let's have some definition before going into details: Object: an existing structure in verbs which describes physical entity e.g.: QP/FLOW/DEVICE/ Counter: a single attribute which is use to count events/statistics on object Counter-set: a set of counters that belongs to one specific object. A generic interface with following functionality is presented : 1. A way to list of all the available counter-sets in the device. Per each counter-set: - What do the counters within the set measure? is it QP? Flow? other? - A Unique identifier per counter-set. - A list of names for all the counters within each counter-set. since each vendor has their own counters/stats. Each vendor could use their own names for a counter. This suggestion aims to replace vendor-specific-api with predefined enums/names for each counter/stats. - Additional Meta-data about a counter-set (for example - is it cached?) 2. Operations available per counter-set: 2.1 Bind and unbind: a counter-set has to be attached to an object in order for any counter within a counter-set to count. the attaching action is referred to as 'bind' and the opposite action is referred to as 'unbind'. Rather than having specific generic operation for bind and unbind I choose to use existing verbs methods. The existing methods could be modified with small changes (like adding a new flag) to bind (or unbind) a counter-set to an object. 2.2 Counter-set may be created: counter-set instance is allocated and created on a ibv context and belongs to that context. 2.3 Counter-set may be destroyed: counter-set instance is destroyed and de-allocated. If counter-set is bonded to object then it is the responsibility of the driver either to unbind prior to hardware de-allocation or to notify the user that driver is unable to destroy a counter-set and it is the user responsibility to unbind prior to destruction. 2.4 Counter-set may be queried: the user supply counter-set instance and output address. The hardware queries the counter-set and writes the output to the address as an array of uint64_t. Each entry in the uint64_t array represents a single counter. The user is expected to query the device on startup, find which counter-sets are supported and to which objects each counter-set may be bonded. During this scan the user also finds out which counters are supported for which object. Example for a way to list of all the available counter-sets in the device. We modify the method int query_device_ex() by adding a new flag to the enum ibv_device_attr_mask: + IBV_DEVICE_ATTR_COUNTER_SET = 1 << 1 When using this flag, the device will response with struct ibv_device_attr_ex with a new attribute: + uint64_t max_supported_counter_sets; And then a user can use a new API to get the description for each counter-set. number of counter set is specified by a counter-set-id. a counter-set id is a number from 0 to max_supported_counter_sets. that is - the number returned from the query_device_ex() call. int ibv_query_counter_set_description(struct ibv_context *context, \ uint64_t counter_set_id, \ struct ibv_counter_set_description * out) - return 0 on success - return -1 when counter_set_id is invalid. The API writes to out the following structure: struct ibv_counter_set_description { // Which type does this set refers to? // value is taken from enum ibv_counter_set_counted_type uint8_t counted_type; // Number of instances of this counter-set available in the hardware uint64_t number_of_counter_sets; // Attributes of the set (bit mask) // value is taken from enum ibv_counter_set_attributes uint32_t attributes; // number of entries uint8_t entries_count; // List of entries, struct ibv_counter_entry entry[256]; } Where: struct ibv_counter_entry { // name of the entry. last entry contains NULL char name[32]; } =========================== Brief explanation for the fields inside struct ibv_counter_set_description: counted_type - contain id for which this counter_set is related to. the id is a value from ibv_counter_set_counted_type (see below) Each counter-set relates to a verbs object, which is the verbs object this counter-set aim to count (i.e. measure), such as QP or Flow. enum ibv_counter_set_counted_type { IBV_COUNTER_IBV_QP = 0, IBV_COUNTER_IBV_FLOW, ... } number_of_counter_sets - how many counters does this device supports? Note that this value can be interpreted in more than one way. Either how many counter_sets are currently available or what is the total (max) number of counter_sets the device supports. this is seen as the max limit of count-set which the process is allowed to create. attributes - special attributes which this counter-set might have either in software or hardware. For example we can have cached counter-set. Which means that every query for that set is read from the cache. Unless a request to read the values from the hardware was specially specified. enum ibv_counter_set_attributes { // the counter-set value is cached by default IBV_COUNTER_ATTR_CACHED = 1 << 1 }; entries_count - number of entries in the counter_set struct ibv_counter_entry entry[256] - an ordered list of counter names where the last name in the array is empty (NULL) ==== Example: ibv_counter_set_description is a struct to describe other structs. for example, we have the following struct: private struct guy_counter { uint64_t apples_kg; uint64_t apples_count; } - note that all counters are 64bit entries The ibv_counter_set_desc looks like this: // according to the ibv_counter_set_type enum ibv_counter_set_desc.type = IBV_COUNTER_IBV_GUYGUY ibv_counter_set_desc.attributes = 0; ibv_counter_set_desc.number_of_counter_sets = 1000; ibv_counter_set_desc.entry[0].name = "apples [Kg]" ibv_counter_set_desc.entry[1].name = "apples [Count]" ibv_counter_set_desc.entry[2].name = \0 ==== How to fill the internal tables? On initialization - driver should query device capabilities to see how many counter-set are supported. for each supported counter-set the driver will act as following: 1. Allocate counter_set_id 2. Register counter_set_id with pointer to data structures with list of counters. 5. Finally - The driver returns number of counter_sets supported in ibv_qeury_device_ex() Operations available for each counter-set: Each ibv_counter is represented by the following structure: struct ibv_counter_set { struct ibv_context *context; uint64_t handle; } THE NEW API struct ibv_counter_set* ibv_create_counter_set(struct ibv_context *context, \ uint16_t counter_set_id) Method returns struct ibv_counter_set which contains context+handle. Actions: Method Allocates memory for struct ibv_counter_set and then calls the driver to allocate the actual hardware counter-set. If successful method returns pointer to struct ibv_counter_set on the heap which contains context+handle. If unsuccessful - method returns NULL and set errno accordingly. int ibv_destroy_counter_set(struct ibv_counter_set* counter_set) Methods destroys input counter_set and free the allocated memory. Actions: Method attempts to remove hardware counter-set and then input struct is released (deleted). In the kernel the code checks if caller is allowed to destroy counter_set (by comparing pid) and then released hardware-resource. If unsuccessful method returns -1 and set errno accordingly. If successful method returns 0. int ibv_query_counter_set(struct ibv_query_counter_set_attr, uint64_t * out) Method receives query structure and output address, then query the hardware and writes output to the uint64_t * out address. Actions: Method recives struct ibv_query_counter_set_attr, parse the query and then send it to execution in kernel. In the kernel the code checks if caller is allowed to query the hardware, executes the query and then writes to *out. If unsuccessful method returns -1 and set errno accordingly. If successful method returns 0 Where: struct ibv_query_counter_set_attr { uint32_t comp_mask ibv_counter_set *counter_set; enum ibv_query_counter_set_attr_params *query_params; } enum ibv_query_counter_set_attr_params { // force hardware query instead of cached value IBV_COUNTER_FORCE_UPDATE = 1 << 1 }; int ibv_query_counter_set_description(struct ibv_context *context, \ uint64_t counter_set_id, \ struct ibv_counter_set_description * out) Method writes out a struct ibv_counter_set_description which contains a description of a counter-set. User should allocate sizeof(struct ibv_counter_set_description) for *out; - return 0 on success - return -1 when counter_set_id is invalid. Example on using counter_sets: void foo(struct ibv_context *context, int counter_set_id) { // an array of attributes. this is a container of counter-set values uint64_t my_counter[256]; struct ibv_counter_set_description my_description; struct ibv_counter_set* my_counter_set ; ibv_query_counter_set_description(context, counter_set_id, &my_description); my_counter_set = ibv_create_counter_set(context,counter_set_id); // let's define the query object struct ibv_query_counter_set_attr my_query; my_query.comp_mask = 0; my_query.counter_set = &my_counter_set; my_query.query_params = 0; // finally - do the query. if(-1 == ibv_query_counter_set(my_query, my_counter)) { printf("query failed") } else { for(int i = 0 ; i < my_description->entries_count ; ++i) { printf("name %d = %lu, \ my_description->entries[i].name, my_counter[i]); } } } -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html