[CC += Denys, since he's had a lot of input to ptrace(2) in the past, and perhaps might also have a comment to this patch] On Mon, 18 May 2020 at 05:00, Keno Fischer <keno@xxxxxxxxxxxxxxxxxx> wrote: > > Correctly using the result of this operation is quite hard, > because the layout is not fixed and depends on the kernel > configuration. Furthermore, because of the initial state > optimization, parts of the layout may be missing. If ptrace > users are not careful, it is easy to get unexpected results. > This documents everything I know about how to use NT_X86_XSTATE > "correctly". This should probably have been documented earlier, > since every single ptrace application I looked at gets this wrong > in one way or another, but hopefully having documentation will at > least help future users cover the relevant corner cases. > > Signed-off-by: Keno Fischer <keno@xxxxxxxxxxxxxxxxxx> > > --- > I'm hoping this will help. I recently had occasion to read up on > how this actually works (finding I, too, had used it incorrectly), > for patch in https://lkml.org/lkml/2020/4/6/1161. I'm cc'ing the > folks who took part in that review here, since I think they would > be interested in making sure the status quo is adequately documented. > Please let me know if I got anything wrong, or if anything is confusing. > > man2/ptrace.2 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 57 insertions(+) > > diff --git a/man2/ptrace.2 b/man2/ptrace.2 > index 575062971..57958119b 100644 > --- a/man2/ptrace.2 > +++ b/man2/ptrace.2 > @@ -2322,6 +2322,63 @@ result, to the real parent (to the real parent only when the > whole multithreaded process exits). > If the tracer and the real parent are the same process, > the report is sent only once. > +.SS The layout and operation of the NT_X86_XSTATE regset > +On x86(_64), the values of extended registers can be obtained as an xstate buffer, > +accessed through the NT_X86_XSTATE option to > +.B PTRACE_GETREGSET. > +The layout of this area is relatively complex (and described below). It is > +in general not safe to assume that a buffer obtained using > +.B PTRACE_GETREGSET > +may be set back to any task using > +.B PTRACE_SETREGSET > +while resulting in a task that has equivalent register state (see below for > +details). It is also not safe to assume that the buffer is a valid xsave area > +that may be restored using the > +.I xrstor > +instruction, nor is it safe to assume that any extended state component is > +located at a particular fixed offset. Instead the following algorithm should be used to > +compute the offset of any given state component within the xsave buffer: > +.IP 1. 3 > +Obtain the kernel xsave component bitmask from the software-reserved area of the > +xstate buffer. The software-reserved area beings at offset 464 into the xsave > +buffer and the first 64 bits of this area contain the kernel xsave component bitmask > +.IP 2. > +Compute the offset of each state component by adding the sizes of all prior state > +components that are enabled in the kernel xsave component bitmask, aligning to 64 byte boundaries along the way. This > +format matches that of a compacted xsave area with XCOMP_BV set to the > +kernel component bitmask. Further details on the layout of the compacted xsave > +area may be found in the Intel architecture manual, but note that the xsave > +buffer returned from ptrace will have its XCOMP_BV set to 0. > +.IP 3. > +For the given state component of interest, check the corresponding bit > +in the xsave header's XSTATE_BV bitfield. If this bit is zero, the corresponding > +component is in initial state and was not written to the buffer (i.e. the kernel > +does not touch the memory corresponding to this state component at all, > +the start offset next active state component will not be affected unless > +the bit is also missing from the kernel component bitmask obtained in step 1). > +The initial state for any state component is defined in the Intel architecture manual (for > +most state components it is the zero state). > +.PP > + > +In particular, the third of these considerations results in a buffer that does > +not round-trip through > +.B PTRACE_SETREGSET. > +If a given state component is missing from the XSTATE_BV bitfield, it will > +be ignored by > +.B PTRACE_SETREGSET > +even if the corresponding register in the target task is currently not in > +initial state. > + > +Thus, to obtain an xsave area that may be set back to the tracee, all unused > +state components must first be re-set to the correct initial state for the > +corresponding state component, and the XSTATE_BV bitfield must subsequently > +be adjusted to match the kernel xstate component bitmask (obtained as > +described above). > + > +The value of the kernel's state component bitmask is determined on boot and > +need not be equivalent to the maximal set of state components supported by the > +CPU (as enumerated through CPUID). > + > .SH RETURN VALUE > On success, the > .B PTRACE_PEEK* > -- > 2.25.1 > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/