Re: [musl] Re: [PATCH v8 00/38] arm64/gcs: Provide support for GCS in userspace

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Wed, 21 Feb 2024 18:53:44 +0000

On Wed, 2024-02-21 at 13:30 -0500, dalias@xxxxxxxx wrote:
> > 3 is the cleanest and safest I think, and it was thought it might
> > not
> > need kernel help, due to a scheme Florian had to direct signals to
> > specific threads. It's my preference at this point.
> 
> The operations where the shadow stack has to be processed need to be
> executable from async-signal context, so this imposes a requirement
> to
> block all signals around the lock. This makes all longjmps a heavy,
> multi-syscall operation rather than O(1) userspace operation. I do
> not
> think this is an acceptable implementation, especially when there are
> clearly superior alternatives without that cost or invasiveness.

That is a good point. Could the per-thread locks be nestable to address
this? We just need to know if a thread *might* be using shadow stacks.
So we really just need a per-thread count.

> 
> > 1 and 2 are POCed here, if you are interested:
> > https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/
> 
> I'm not clear why 2 (suppression of #CP) is desirable at all. If
> shadow stack is being disabled, it should just be disabled, with
> minimal fault handling to paper over any racing operations at the
> moment it's disabled. Leaving it on with extreme slowness to make it
> not actually do anything does not seem useful.

The benefit is that code that is using shadow stack instructions won't
crash if it relies on them working. For example RDSSP turns into a NOP
if shadow stack is disabled, and the intrinsic is written such that a
NULL pointer is returned if shadow stack is disabled. The shadow stack
is normally readable, and this happens in glibc sometimes. So if there
was code like:

   long foo = *(long *)_get_ssp();

...then it could suddenly read a NULL pointer if shadow stack got
disabled. (notice, it's not even a "shadow stack access" fault-wise. So
it was looked at as somewhat more robust. But neither 1 or 2 are
perfect for apps that are manually using shadow stack instructions.

> 
> Is there some way folks have in mind to use option 2 to lazily
> disable
> shadow stack once the first SS-incompatible code is executed, when
> execution is then known not to be in the middle of a SS-critical
> section, instead of doing it right away? I don't see how this could
> work, since the SS-incompatible code could be running from a signal
> handler that interrupted an SS-critical section.

The idea was to disable it without critical sections, and it could be
more robust, but not perfect. I was preferring option 1 between 1 and
2, which was closer to your original suggestion. But it has problems
like the example I gave above. I agree 1 is relatively simpler for the
kernel, between 1 and 2.

> 
> > > If folks on the kernel side are not going to be amenable to doing
> > > the
> > > things that are easy for the kernel to make it work without
> > > breaking
> > > compatibility with existing interfaces, but that are impossible
> > > or
> > > near-impossible for userspace to do, this seems like a dead-end.
> > > And
> > > I
> > > suspect an operation to "disable shadow stack, but without making
> > > threads still in SS-critical sections crash" is going to be
> > > necessary..
> > 
> > I think we have to work through all the alternative before we can
> > accuse the kernel of not being amenable. Is there something that
> > you
> > would like to see out of this conversation that is not happening?
> 
> No, I was just interpreting "uphill battle". I really do not want to
> engage in an uphill battle for the sake of making it practical to
> support something that was never my goal to begin with. If I'm
> misreading this, or if others are willing to put the effort into that
> "battle", I'd be happy to be mistaken about "not amenable".

I don't think x86 maintainers have put a foot down on anything around
this at least. They would normally have concerns about complexity and
maintainability. So if we have something that has lower value
(imperfect solution), and high complexity, it starts to look like less
promising path.