On Mon, Sep 05, 2022 at 04:30:25PM +0200, Peter Zijlstra wrote: > On Mon, Sep 05, 2022 at 04:44:57PM +0300, Kirill A. Shutemov wrote: > > On Mon, Sep 05, 2022 at 10:35:44AM +0530, Bharata B Rao wrote: > > > Hi Kirill, > > > > > > On 9/4/2022 6:30 AM, Kirill A. Shutemov wrote: > > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > > > >> Linear Address Masking[1] (LAM) modifies the checking that is applied to > > > >> 64-bit linear addresses, allowing software to use of the untranslated > > > >> address bits for metadata. > > > >> > > > >> The patchset brings support for LAM for userspace addresses. Only LAM_U57 at > > > >> this time. > > > >> > > > >> Please review and consider applying. > > > >> > > > >> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam > > > > > > > > +Bharata, Ananth. > > > > > > > > Do you folks have any feedback on the patchset? > > > > > > > > Looks like AMD version of the tagged pointers feature does not get > > > > traction as of now, but I want to be sure that the interface introduced > > > > here can be suitable for your future plans. > > > > > > > > Do you see anything in the interface that can prevent it to be extended to > > > > the AMD feature? > > > > > > The arch_prctl() extensions is generic enough that it should be good. > > > > > > The untagged_addr() macro looks like this from one of the callers: > > > > > > start = untagged_addr(mm, start); > > > ffffffff814d39bb: 48 8b 8d 40 ff ff ff mov -0xc0(%rbp),%rcx > > > ffffffff814d39c2: 48 89 f2 mov %rsi,%rdx > > > ffffffff814d39c5: 48 c1 fa 3f sar $0x3f,%rdx > > > ffffffff814d39c9: 48 0b 91 50 03 00 00 or 0x350(%rcx),%rdx > > > ffffffff814d39d0: 48 21 f2 and %rsi,%rdx > > > ffffffff814d39d3: 49 89 d6 mov %rdx,%r14 > > > > > > Can this overhead of a few additional instructions be removed for > > > platforms that don't have LAM feature? I haven't measured how much > > > overhead this effectively contributes to in real, but wonder if it is > > > worth optimizing for non-LAM platforms. > > > > I'm not sure how the optimization should look like. I guess we can stick > > static_cpu_has() there, but I'm not convinced that adding jumps there will > > be beneficial. > > I suppose the critical bit is the memory load. That can stall and then > you're sad. A jump_label is easy enough to add. What about something like this? diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 803241dfc473..868d2730884b 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -30,8 +30,10 @@ static inline bool pagefault_disabled(void); */ #define untagged_addr(mm, addr) ({ \ u64 __addr = (__force u64)(addr); \ - s64 sign = (s64)__addr >> 63; \ - __addr &= (mm)->context.untag_mask | sign; \ + if (static_cpu_has(X86_FEATURE_LAM)) { \ + s64 sign = (s64)__addr >> 63; \ + __addr &= (mm)->context.untag_mask | sign; \ + } \ (__force __typeof__(addr))__addr; \ }) -- Kiryl Shutsemau / Kirill A. Shutemov