On 10/15/2018 01:37 PM, Khalid Aziz wrote:
On 09/24/2018 08:45 AM, Stecklina, Julian wrote:
I didn't test the version with TLB flushes, because it's clear that the
overhead is so bad that no one wants to use this.
I don't think we can ignore the vulnerability caused by not flushing
stale TLB entries. On a mostly idle system, TLB entries hang around long
enough to make it fairly easy to exploit this. I was able to use the
additional test in lkdtm module added by this patch series to
successfully read pages unmapped from physmap by just waiting for system
to become idle. A rogue program can simply monitor system load and mount
its attack using ret2dir exploit when system is mostly idle. This brings
us back to the prohibitive cost of TLB flushes. If we are unmapping a
page from physmap every time the page is allocated to userspace, we are
forced to incur the cost of TLB flushes in some way. Work Tycho was
doing to implement Dave's suggestion can help here. Once Tycho has
something working, I can measure overhead on my test machine. Tycho, I
can help with your implementation if you need.
I looked at Tycho's last patch with batch update from
<https://lkml.org/lkml/2017/11/9/951>. I ported it on top of Julian's
patches and got it working well enough to gather performance numbers.
Here is what I see for system times on a machine with dual Xeon E5-2630
and 256GB of memory when running "make -j30 all" on 4.18.6 kernel
(percentages are relative to base 4.19-rc8 kernel without xpfo):
Base 4.19-rc8 913.84s
4.19-rc8 + xpfo, no TLB flush 1027.985s (+12.5%)
4.19-rc8 + batch update, no TLB flush 970.39s (+6.2%)
4.19-rc8 + xpfo, TLB flush 8458.449s (+825.6%)
4.19-rc8 + batch update, TLB flush 4665.659s (+410.6%)
Batch update is significant improvement but we are starting so far
behind baseline, it is still a huge slow down.
--
Khalid