On Fri, May 15, 2020 at 12:06 AM Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote: > >> + u16 req_id; > >> + u8 seq_id; > >> + u8 cmd; > >> + u32 resv; > >> + u64 src_addr; > >> + u64 dest_addr; > >> + u32 len; > >> + u32 resv2; > >> + u64 db_addr; /* doorbell address */ > >> + u8 db_len; /* not a raw value, special encoding */ > >> + u8 resv3; > >> + u16 resv4; > >> + u32 db_data; > >> + u32 sem_cmd0; > >> + u32 sem_cmd1; > >> + u32 sem_cmd2; > >> + u32 sem_cmd3; > >> +} __packed; > > > > All members are naturally aligned, so better drop the __packed > > annotation get better code, unless the structure itself is > > at an unaligned offset in memory. > > I'm going to have to disagree. While most "sane" compilers would not > add extra padding, I've debugged enough issues in the past when > sending/receiving data with foreign environments to never trust anything > that isn't "packed". > > Unless I missed something in the C spec that requires naturally aligned > structures to have an identical layout in memory, I'll take safety and > functional correctness over performance. The C standard does not mandate the layout and individual ABIs can be different, but Linux does only runs on ABIs that have the correct layout for the structure above The problem is that the compiler will split up word accesses into bytes on some architectures such as older ARM, to avoid unaligned loads and stores. It should be fine if you add both __packed and __aligned(8) to make the structure aligned again. > >> + init_completion(&mem->xfer_done); > >> + list_add_tail(&mem->list, &dbc->xfer_list); > >> + tail = (tail + mem->nents) % dbc->nelem; > >> + __raw_writel(cpu_to_le32(tail), dbc->dbc_base + REQTP_OFF); > > > > What is this __raw accessor for? This generally results in non-portable > > code that should be replaced with writel() or something specific to > > the bus on the architecture you deal with. > > The barrier(s) that comes with writel are unnecessary in this case. > Since this is part of our critical path, we are sensitive to its > performance. > > What are the portability issues around the __raw variant? The access may not be atomic on all architectures but get either split up or combined with adjacent accesses. There is no guarantee about endianess: while most architectures treat __raw access as a simple load/store, some might have an implied byteswap. __raw accesses might be completely reordered around other mmio accesses or spinlocks. If you want to avoid the barrier, there is the writel_relaxed()/readl_relaxed() that skips most barriers, so they can be reordered against MSI interrupts and DMA accesses on architectures that allow (e.g. ARM or PowerPC) but not against each other. If you do use them, I'd still expect a comment that explains why this instance is performance critical and how it is synchronized against DMA or MSI if necessary. Arnd