On 7/8/24 20:17, Yang, Weijiang wrote: > So I'm not sure whether XFEATURE_MASK_KERNEL_DYNAMIC and related changes > are worth or not for this series. > > Could you share your thoughts? First of all, I really do appreciate when folks make the effort to _try_ to draw their own conclusions before asking the maintainers to share theirs. Next time, OK? ;) But here goes. So we've basically got three cases. Here's a fancy table: > https://docs.google.com/spreadsheets/d/e/2PACX-1vROHIgrtHzUJmdlzT7D7tuVzgM8AMlK2XlorvFIJvk-I0NjD7A-T_qntjz7cUJlCScfWGtSfPK30Xtu/pubhtml ... and the same in ASCII Case |IA32_XSS[12] | Space | RFBM[12] | Drop% -----+-------------+-------+----------+------ 1 | 0 | None | 0 | 0.0% 2 | 1 | None | 0 | 0.2% 3 | 1 | 24B? | 1 | 0.2% Case 1 is the baseline of course. Case 2 avoids allocating space for CET and also leans on the kernel to set RFBM[12]==0 and tell the hardware not to write CET-S state. Case 3 wastes the CET-S space in each task and also leans on the hardware init optimization to avoid writing out CET-S space on each XSAVES. #1 is: 0 lines of code. #2 is: 5 files changed, 90 insertions(+), 27 deletions(-) #3 is: very few lines of code, nearing zero #2 and #3 have the same performance. So we're down to choosing between * $BYTES space in 'struct fpu' (on hardware supporting CET-S) or * ~100 loc $BYTES is 24, right? Did I get anything wrong? So, here's my stake in the ground: I think the 100 lines of code is probably worth it. But I also hate complicating the FPU code, so I'm also somewhat drawn to just eating the 24 bytes and moving on. But I'm still in the "case 2" camp. Anybody disagree?