Re: [PATCH for-next V2 0/9] Add completion timestamping support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2015-05-31 at 15:14 +0300, Or Gerlitz wrote:
> Hi Doug,
> 
> This patchset adds completion timestamping supports for verbs consumers. 
> 
> Reviewing the weekend threads, we've changed the flag time to reflect
> that this is completion time-stamp and folded the mlx4 actual support 
> into one patch.
> 
> Regarding the related user-space support, it's possible to add what you
> were suggesting, ibv_get_raw_cqe_timestamp() -- returns ts in cycles and 
> ibv_get_cqe_timestamp() -- returns ts in ns, this makes the value returned
> by the poll cq verb an opaque one that must go through one of  the convertors.
> 
> We would to go for one helper ibv_get_timestamp(uint64_t raw_time, flag) which 
> could get the raw time-stamp and one of the following flags: RAW_TIME, RAW_NS_TIME.

I'm theoretically OK with something similar to the above.  However, the
NS time should not be raw.  It should be cooked and should be able to be
valid to compare between different adapters.  Right now, the cycle
counter that you are exposing is only useful for ordering between
packets received on a single adapter where the cycle counter is the same
on all packets.  Throw in a different vendor's card, or two of your own
cards, and the issue gets much more complex.  The cooked value should be
an actual, real time that can be used across these more complex
environments.  Because of that, it really shouldn't be called RAW.

So, if you want a single entry point, I would suggest something like
this:

enum ib_timestamp_flags {
	IB_TIMESTAMP_COMPLETION = (1 << 0), // specify on create_cq
	IB_TIMESTAMP_WQE_BEGIN =  (1 << 1), // specify on create qp?
	IB_TIMESTAMP_WQE_END =    (1 << 2), // specify on create qp?
	IB_TIMESTAMP_RAW =        (1 << 31)
};

enum ib_cq_creation_flags {
	IB_CQ_FLAGS_TIMESTAMP_COMPLETION = (1 << 0)
};

/**
 * ibv_get_timestamp - Return the requested timestamp for the given wc
 * @wc - work completion to get timestamp results from
 * @ts - struct timespec to return timestamp in
 * @flags - which timestamp to return and in what form
 *
 * Depending on the flags used to create the queue pair/completion
 * queue, different timestamps might be available.  Callers should
 * specify which timestamp they are interested in using the flags
 * element, and if they wish either a cooked or raw timestamp.  A
 * raw timestamp is implementation defined and will be passed back
 * in the tv_nsec portion of the struct timespec.  A raw timestamp
 * can not be relied upon to have any ordering value between more
 * than one HCA or driver.  A cooked timestamp will return a valid
 * struct timespec normalized as closely as possible to the return
 * value for CLOCK_MONOTONIC of clock_gettime at the time of the
 * timestamp.
 */
int ibv_get_timestamp(struct ibv_wc *wc, struct timespec *ts, int
flags);

> We think this would address the reviewer comments for the kernel submission.
> 
> The user-space code is in (still uses IB_CQ_FLAGS_TIMESTAMP and miss the 
> conversion functions) 
> 
>  https://github.com/matanb10/libibverbs timestamp-v1
>  https://github.com/matanb10/libmlx4 timestamp-v1
> 
> Timestamping is used by applications in order to know when a WQE was 
> received/transmitted by the HW. The value is given is HCA hardware cycles,
> but could be easily converted as the hardware's core clock frequecny is 
> available through extension of query device. 
> 
> Moreover, we add an ability to read the HCA's current clock. This could be 
> useful on order to synchronize events to the wall clock.
> 
> This functionality is achieved by adding/extending the following verbs:
> 
> create_cq - create_cq is extended in order to allow passing creation flags
> to the CQ creation function. We change IB/core --> vendors API
> to be easily extendible by passing a struct which contains
> comp_vectors, cqe and the new flags parameter. In order to create
> CQ which supports timestamping, IB_CQ_FLAGS_TIMESTAMP_COMPLETION should be given.
> 
> query_device - We extend query_device uverb further by giving the hardware's
> clock frequency and the timestamp mask (the number of timestamp
> bits which are supported). If timestamp isn't supported, 0 is returned.
> 
> In order to read the timestamp in the WQE, the user needs to query the device 
> for support, create an appropriate CQ (using the extanded uverb with
> IB_CQ_FLAGS_TIMESTAMP_COMPLETION) and poll the CQ with an extended poll_cq verb (currently,
> only implemented in user-space).
> 
> In mlx4, allowing the user to read the core clock efficiently involves mapping
> this area of the hardware to user-space (being done by using a mmap command)
> and reading the clock from the correct offset of the page. 
> 
> This offset is returned in the vendor's specific data from mlx4's kernel driver 
> to the mlx4's user-space driver. query_device is modified in order to support
> passing this vendor specific data. A user-space application could use a new
> verb in order to read the hardware's clock.
> 
> Translating the hardware's clock into ms could be done by dividing this
> value by hca_core_clock (which is returned by the extended version of
> query_device uverb).
> 
> A user-space application could get the current HW's clock by executing
> 
> ibv_query_values_ex(struct ibv_context *context, uint32_t q_values,
>                     struct ibv_values_ex *values)
> 
> The function gets a mask of the values to query and return their values.
> Vendors could either implement this as a uverb command or use their 
> user-space driver to return those values directly from the HW (the mlx4 way).
> 
> Matan and Or.
> 
> Changes from V1:
>  (1) fixed lustre IB's code build
>  (2) squashed mlx4 V1 9-11 patches into one
>  (3) changed IB_CQ_FLAGS_TIMESTAMP --> IB_CQ_FLAGS_TIMESTAMP_COMPLETION
> 
> Changes from V0:
> (1) Pass ib_cq_init_attr instead of cqe and comp_vector.
> (2) Fix unneeded indentation.
> (3) Change flags to u32.
> (4) Add const to create_cq's ib_cq_init_attr argument in vendor implementation.
> 
> Matan Barak (9):
>   IB/core: Change provider's API of create_cq to be extendible
>   IB/core: Change ib_create_cq to use struct ib_cq_init_attr
>   IB/core: Add CQ creation time-stamping flag
>   IB/core: Extend ib_uverbs_create_cq
>   IB/core: Add timestamp_mask and hca_core_clock to query_device
>   IB/core: Pass hardware specific data in query_device
>   IB/mlx4: Add mmap call to map the hardware clock
>   IB/mlx4: Support extended create_cq and query_device uverbs
>   IB/mlx4: Add support for CQ time-stamping
> 
>  drivers/infiniband/core/device.c                   |    6 +-
>  drivers/infiniband/core/mad.c                      |    5 +-
>  drivers/infiniband/core/uverbs.h                   |    1 +
>  drivers/infiniband/core/uverbs_cmd.c               |  188 ++++++++++++++++----
>  drivers/infiniband/core/uverbs_main.c              |    1 +
>  drivers/infiniband/core/verbs.c                    |    4 +-
>  drivers/infiniband/hw/amso1100/c2_provider.c       |   14 ++-
>  drivers/infiniband/hw/cxgb3/iwch_provider.c        |   19 ++-
>  drivers/infiniband/hw/cxgb4/cq.c                   |    9 +-
>  drivers/infiniband/hw/cxgb4/iw_cxgb4.h             |    8 +-
>  drivers/infiniband/hw/cxgb4/provider.c             |    8 +-
>  drivers/infiniband/hw/ehca/ehca_cq.c               |    7 +-
>  drivers/infiniband/hw/ehca/ehca_hca.c              |    6 +-
>  drivers/infiniband/hw/ehca/ehca_iverbs.h           |    6 +-
>  drivers/infiniband/hw/ehca/ehca_main.c             |    6 +-
>  drivers/infiniband/hw/ipath/ipath_cq.c             |    9 +-
>  drivers/infiniband/hw/ipath/ipath_verbs.c          |    7 +-
>  drivers/infiniband/hw/ipath/ipath_verbs.h          |    3 +-
>  drivers/infiniband/hw/mlx4/cq.c                    |   13 ++-
>  drivers/infiniband/hw/mlx4/mad.c                   |    5 +-
>  drivers/infiniband/hw/mlx4/main.c                  |   67 +++++++-
>  drivers/infiniband/hw/mlx4/mlx4_ib.h               |   19 ++-
>  drivers/infiniband/hw/mlx5/cq.c                    |   10 +-
>  drivers/infiniband/hw/mlx5/main.c                  |   19 ++-
>  drivers/infiniband/hw/mlx5/mlx5_ib.h               |    5 +-
>  drivers/infiniband/hw/mthca/mthca_provider.c       |   15 ++-
>  drivers/infiniband/hw/nes/nes_verbs.c              |   17 ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c        |   13 ++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h        |    9 +-
>  drivers/infiniband/hw/qib/qib_cq.c                 |   11 +-
>  drivers/infiniband/hw/qib/qib_verbs.c              |    6 +-
>  drivers/infiniband/hw/qib/qib_verbs.h              |    5 +-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c       |   16 ++-
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.h       |   10 +-
>  drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |    9 +-
>  drivers/infiniband/ulp/iser/iser_verbs.c           |    6 +-
>  drivers/infiniband/ulp/isert/ib_isert.c            |    6 +-
>  drivers/infiniband/ulp/srp/ib_srp.c                |   10 +-
>  drivers/infiniband/ulp/srpt/ib_srpt.c              |    5 +-
>  drivers/net/ethernet/mellanox/mlx4/main.c          |   19 ++
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    7 +-
>  include/linux/mlx4/device.h                        |    9 +
>  include/rdma/ib_verbs.h                            |   25 ++-
>  include/uapi/rdma/ib_user_verbs.h                  |   19 ++
>  net/9p/trans_rdma.c                                |    5 +-
>  net/rds/ib_cm.c                                    |    8 +-
>  net/rds/iw_cm.c                                    |    8 +-
>  net/sunrpc/xprtrdma/svc_rdma_transport.c           |   10 +-
>  net/sunrpc/xprtrdma/verbs.c                        |   10 +-
>  49 files changed, 564 insertions(+), 139 deletions(-)
> 


-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: 0E572FDD

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux