On 1/21/2021 2:32 PM, David Laight wrote:
From: Randy Dunlap
Sent: 21 January 2021 22:19
On 1/21/21 2:16 PM, David Laight wrote:
From: Yu, Yu-cheng
On 1/21/2021 10:44 AM, Borislav Petkov wrote:
On Tue, Dec 29, 2020 at 01:30:35PM -0800, Yu-cheng Yu wrote:
[...]
@@ -343,6 +349,16 @@ static inline pte_t pte_mkold(pte_t pte)
static inline pte_t pte_wrprotect(pte_t pte)
{
+ /*
+ * Blindly clearing _PAGE_RW might accidentally create
+ * a shadow stack PTE (RW=0, Dirty=1). Move the hardware
+ * dirty value to the software bit.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+ pte.pte |= (pte.pte & _PAGE_DIRTY) >> _PAGE_BIT_DIRTY << _PAGE_BIT_COW;
Why the unreadable shifting when you can simply do:
if (pte.pte & _PAGE_DIRTY)
pte.pte |= _PAGE_COW;
?
It clears _PAGE_DIRTY and sets _PAGE_COW. That is,
if (pte.pte & _PAGE_DIRTY) {
pte.pte &= ~_PAGE_DIRTY;
pte.pte |= _PAGE_COW;
}
So, shifting makes resulting code more efficient.
Does the compiler manage to do one shift?
How can it clear anything?
It could shift it off either end since there are both << and >>.
It is still:
pte.pte |= xxxxxxx;
There is only an |= against the target.
Something horrid with ^= might set and clear.
It could be 4 instructions:
is_dirty = pte.pte & PAGE_DIRTY;
pte.pte &= ~PAGE_DIRTY; // or pte.pte ^= is_dirty
is_cow = is_dirty << (BIT_COW - BIT_DIRTY); // or equivalent >>
pte.pte |= is_cow;
provided you've a three operand form for one of the first two instructions.
Something like ARM might manage to merge the last two as well.
But the register dependency chain length may matter more than
the number of instructions.
The above is likely to be three long.
I see what you are saying. The patch is like...
if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
pte.pte |= (pte.pte & _PAGE_DIRTY) >> _PAGE_BIT_DIRTY << _PAGE_BIT_COW;
pte = pte_clear_flags(pte, _PAGE_DIRTY);
}
It is not necessary to do the shifting. I will make it, simply,
if (pte.pte & _PAGE_DIRTY) {
pte.pte &= ~PAGE_DIRTY;
pte.pte |= _PAGE_COW;
}
Thanks for your comments.
--
Yu-cheng