> -----Original Message----- > From: Arnd Bergmann <arnd@xxxxxxxx> > Sent: Tuesday, October 5, 2021 4:42 PM > To: Kalle Valo <kvalo@xxxxxxxxxxxxxx> > Cc: Pkshih <pkshih@xxxxxxxxxxx>; Arnd Bergmann <arnd@xxxxxxxx>; linux-wireless@xxxxxxxxxxxxxxx > Subject: Re: [PATCH v6 03/24] rtw89: add core and trx files > > On Tue, Oct 5, 2021 at 9:46 AM Kalle Valo <kvalo@xxxxxxxxxxxxxx> wrote: > > Pkshih <pkshih@xxxxxxxxxxx> writes: > > >> From: kvalo=codeaurora.org@xxxxxxxxxxxxxxxxx > > >> > > >> > +static __always_inline void RTW89_SET_TXWD(u8 *txdesc, u32 val, > > >> > u8 offset, u32 mask) > > >> > +{ > > >> > + u32 *txd32 = (u32 *)txdesc; > > >> > + > > >> > + le32p_replace_bits((__le32 *)(txd32 + offset), val, mask); > > >> > +} > > >> > > >> I'm not convinced about this either, please just use inline. > > > > > > This is because 'mask' argument of le32p_replace_bits() must be constant > > > only. If I use inline and build this driver with ccflags-y += -Os, > > > compiler reports errors: > > > > > > In function 'field_multiplier', > > > inlined from 'le32_encode_bits' at ./include/linux/bitfield.h:154:1, > > > inlined from 'le32p_replace_bits' at ./include/linux/bitfield.h:154:1, > > > inlined from 'RTW89_SET_FWCMD_UA32.constprop' at > /work/git-root/rtwlan/rtw89/fw.h:1397:2: > > > ./include/linux/bitfield.h:119:3: error: call to '__bad_mask' declared with attribute error: > bad bitfield mask > > > 119 | __bad_mask(); > > > | ^~~~~~~~~~~~ > > > > > > I check the implement of le32p_replace_bits(), it looks like > > > > > > static __always_inline void type##p_replace_bits(__##type *p, \ > > > base val, base field) \ > > > { \ > > > *p = (*p & ~to(field)) | type##_encode_bits(val, field); \ > > > } > > > > > > So, I imitate the function to use __always_inline, and then it works. > > > > > > Do you think I don't need to consider the case of Os? > > > But, -Os seems a standard option of Linux kernel. > > > > > > ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE > > > KBUILD_CFLAGS += -O2 > > > else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 > > > KBUILD_CFLAGS += -O3 > > > else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE > > > KBUILD_CFLAGS += -Os > > > endif > > > > Yeah, we need to support -Os. > > > > Arnd, what do you suggest? Is __always_inline good solution for this? I > > think we should at least add a comment explaining why it's needed. > > __always_inline can make sense to force the compiler to behave > sanely if it doesn't work it out by itself, and I think that is how this > function was meant to be used: the __compiletime_error in bitfield.h > is intended to find any callers that have a non-constant argument, > because that would result in horrible code. > > I would suggest looking at the object code that you get with -Os after > the added __always_inline, just to make sure that this isn't also > horrible. I check the function rtw89_core_fill_txdesc() which uses these macros. With inline, the object code size is 0x1AF. With __always_inline and -Os, the size is 0x1A4. (x86-64 platform) Compare the object codes side-by-side, they are almost the same except to some instructions. I think this is because the inline function I apply __always_inline contains only a simple statement. > > Looking at the driver code, as in > > +#define RTW89_SET_TXWD_BODY_WP_OFFSET(txdesc, val) \ > + RTW89_SET_TXWD(txdesc, val, 0x00, GENMASK(31, 24)) > +#define RTW89_SET_TXWD_BODY_MORE_DATA(txdesc, val) \ > + RTW89_SET_TXWD(txdesc, val, 0x00, BIT(23)) > +#define RTW89_SET_TXWD_BODY_WD_INFO_EN(txdesc, val) \ > + RTW89_SET_TXWD(txdesc, val, 0x00, BIT(22)) > +#define RTW89_SET_TXWD_BODY_FW_DL(txdesc, val) \ > + RTW89_SET_TXWD(txdesc, val, 0x00, BIT(20)) > > I would personally write this without the wrappers, instead defining the > bitmask macros as the masks and then open-coding the > le32p_replace_bits() calls instead, which I would find more > intuitive while it avoids the problem with the bitmasks. Use these macros can address offset and bit fields quickly. How about I use macro instead of inline function? Like, #define RTW89_SET_TXWD (txdesc, val, offset, mask) \ do { \ u32 *txd32 = (u32 *)txdesc; \ le32p_replace_bits((__le32 *)(txd32 + offset), val, mask); \ } while (0) > Going back one more step, I see that that rtw89_core_fill_txdesc() > manipulates the descriptor fields in-memory, which also seems > like a bad idea: The descriptor is mapped as cache-coherent, > so on machines with no coherent DMA (i.e. most ARM or MIPS > machines), that is uncached memory, and writing the descriptor > using a series of read-modify-write cycles on uncached memory > will be awfully slow. Maybe the answer is to just completely > replace the descriptor access. I'll think if we can use chached memory with single_map/unmap for descriptor. That would improve the performance. -- Ping-Ke