Re: [PATCH v2 for-next 2/3] IB/{hfi1, rdmavt, qib}: Fit completions into single aligned cache-line

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 10, 2018 at 09:49:19AM -0700, Dennis Dalessandro wrote:
> From: Sebastian Sanchez <sebastian.sanchez@xxxxxxxxx>
> 
> The struct ib_wc uses two cache-lines per completion, and it is
> unaligned. This structure used to fit within one cacheline, but it was
> expanded by fields added in the following patches:

Like Parav says, that statement seems to be nonsense:

struct ib_wc {
        union {
                u64                wr_id;                /*           8 */
                struct ib_cqe *    wr_cqe;               /*           8 */
        };                                               /*     0     8 */
        enum ib_wc_status          status;               /*     8     4 */
        enum ib_wc_opcode          opcode;               /*    12     4 */
        u32                        vendor_err;           /*    16     4 */
        u32                        byte_len;             /*    20     4 */
        struct ib_qp *             qp;                   /*    24     8 */
        union {
                __be32             imm_data;             /*           4 */
                u32                invalidate_rkey;      /*           4 */
        } ex;                                            /*    32     4 */
        u32                        src_qp;               /*    36     4 */
        u32                        slid;                 /*    40     4 */
        int                        wc_flags;             /*    44     4 */
        u16                        pkey_index;           /*    48     2 */
        u8                         sl;                   /*    50     1 */
        u8                         dlid_path_bits;       /*    51     1 */
        u8                         port_num;             /*    52     1 */
        u8                         smac[6];              /*    53     6 */

        /* XXX 1 byte hole, try to pack */

        u16                        vlan_id;              /*    60     2 */
        u8                         network_hdr_type;     /*    62     1 */

        /* size: 64, cachelines: 1, members: 17 */
        /* sum members: 62, holes: 1, sum holes: 1 */
        /* padding: 1 */
};

> Create a kernel only rvt_wc structure that is a single aligned
> cache-line. This reduces the cache lines used per completion and
> eliminates any cache line push-pull by aligning the size to a
> cache-line.

Not at all sure this is even a good idea to cache align. Most of the
usages here are singletons on-stack and we can resonably expect the
stack to be hot in the cache. Wasting stack space sounds like a
performance negative..

So not taking this, resend with an accurate commit message and some
performance numbers to try again..

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux