Re: [PATCH 0/7] Upgrade the remote access to a xen domain.

Dave Anderson <anderson@xxxxxxxxxx> · Tue, 19 Nov 2013 14:28:16 -0500 (EST)

Hi Don,

I still need some help understanding this "resurrection" of the remote
access facility, i.e., the new meanings of the REMOTE_xxx flags/macros.

The currently-existing comment for the is_remote_daemon() function
in remote.c describes the (obsolete) usage of the facility:

 /*
  *  Parse, verify and establish a connection with the network daemon
  *  specified on the crash command line.
  *
  *  The format is: [remote-hostname]:port[,remote-namelist][,remote-dumpfile]
  *
  *  where everything but the port number is optional, and the remote-namelist
  *  and remote-dumpfile can be reversed.
  *
  *    1. The default remote host is the local host.
  *    2. The default dumpfile is /dev/mem.
  *    3. If no remote-namelist and remote-dumpfile are given, the daemon
  *       is queried for a kernel that matches the remote /proc/version.
  *       If no local kernel namelist is entered, the remote version will
  *       be copied locally when fd_init() is called.
  *    4. If a remote-dumpfile is given with no remote namelist, it is presumed
  *       that the kernel namelist will be entered locally.
  */

Back then, the remote "crashd" daemon could be contacted for:

 (1) live access to the remote machine's kernel using its /dev/mem, or
 (2) accessing a vmlinux/vmcore pair that existed on the remote machine.

Can you explain this new usage?  It sounds like there is a remote daemon
running on a remote Dom0 kernel, and that your new functionality can only
be used to:

 (1) access a running domU guest hosted by that remote system, or
 (2) access a paused domU guest hosted by that remote system. 

And that there is also a device that the the remote daemon uses
to access the memory of the guest, be it paused or live.

Am I in the ballpark?

Dave

----- Original Message -----
> On 11/19/13 11:08, Dave Anderson wrote:
> >
> > ----- Original Message -----
> >> From: Don Slutz <dslutz@xxxxxxxxxxx>
> >>
> >> Currently crash 4.0 and later using the code:
> >>
> >> http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html
> >>
> >> There is some very limited documention on crashes remote ptotocol in:
> >>
> >> http://lists.xen.org/archives/html/xen-devel/2013-11/msg02352.html
> >>
> >> and some fixes in the attachment
> >> 0001-xen-crashd-Connect-crash-with-domain.patch
> >> from the 1st link above.
> >>
> >> How ever there are issues with the current code:
> >>
> >> 1) The remote protocol has a minor issue (which I have not been able to
> >>     happen) based on the fact TCP/IP is stream based protocol.  This means
> >>     a RECV or a SEND may not do the fully requested size of data.  In fact
> >>     the current code assumes that the amount of data that a SEND is called
> >>     with will all be read with a single RECV.
> >>
> >> 2) The most common mismatch between crash and older kernels is
> >>     phys_base.  In the remote case, see if the remote server supports
> >>     vitrual memory access, and if so, see if phys_base can be retreived.
> >>
> >> 3) crash assumes that the remote system is active and can not return
> >>     currect IP and SP to do a better back trace.
> >>
> >> 4) enable a non-active mode of remote access.
> >>
> >> This patch set attempts to fix these:
> >>
> >> The fix for #1 I have called NIL mode (patch 1).
> >> The fix for #2 uses get_remote_phys_base (patch 3).
> >> The fix for #3 uses get_remote_regs (patch 5).
> >> The fix for #4 uses special file /dev/xenmem (patch 7).
> >>    It also REMOTE_NIL() to indicate "remote paused system".
> >>
> >> Don Slutz (7):
> >>    Add NIL mode to remote.
> >>    remote_proc_version: NULL terminate passed buffer on error.
> >>    Add get_remote_phys_base.
> >>    Add remote_vtop.
> >>    bt: get remote live registers if possible.
> >>    Add get_remote_cr3
> >>    Add support for non-live remote.
> >>
> >>   defs.h   |   8 +-
> >>   kernel.c |  25 +++-
> >>   memory.c |  25 +++-
> >>   remote.c | 462
> >>   +++++++++++++++++++++++++++++++++++++++++++++++++++++----------
> >>   x86_64.c |  24 ++++
> >>   5 files changed, 465 insertions(+), 79 deletions(-)
> >>
> >> --
> >> 1.8.4
> >   
> > Hi Don,
> >
> > A few comments...
> >
> > I don't really understand what entity the code is communicating with,
> > so for the most part, I'm primarily interested in making sure that your
> > patches in no way can affect the normal usage of the crash utility.
> >
> > That being said, is the term "NIL mode" some kind of Xen-ism?
> Nope, it is term I came up with to describe the protocol change.
> >   It wasn't
> > until I saw the display_sys_stats() function that I even understood that
> > REMOTE_NIL mean "(remote paused system)".  Anyway, if you made it something
> > like "REMOTE_PAUSED" or something like that, the code might make some sense
> > to the untrained eye.
> Will do.  Looking back at it, I ended up overloading the term. :(
> >
> > And it's not entirely clear to me what REMOTE_ACTIVE() means now?  It used
> > to imply communicating with a remote crash daeamon that accessed the live
> > system that the daemon was running on.  Has that meaning been changed to
> > mean
> > that crash is communicating with a daemon that's accessing a "live" Xen
> > guest
> > kernel?
> The meaning is not as clear.  Since the existing versions of crash already
> have REMOTE_ACTIVE() and I wanted the Xen utility to be usable to them, in
> both cases the "live" Xen guest kernel is being accessed while it is paused.
> I added REMOTE_PAUSED() to have a few changes to crash to correctly
> understand that the guest is paused. As part of this I add the fake file
> "/dev/xenmem".
> >
> > I'm confused about these two changes, where you re-patch your own patch:
> >
> > -        else
> > +        else if (REMOTE_ACTIVE() && ((bt->task == tt->this_task) ||
> > is_task_active(bt->task))) {
> > +               if (get_remote_regs(bt, &eip, &esp))
> > +                       machdep->get_stack_frame(bt, &eip, &esp);
> > +       } else
> >    
> > followed by:
> >
> > -        else if (REMOTE_ACTIVE() && ((bt->task == tt->this_task) ||
> > is_task_active(bt->task))) {
> > +        else if (REMOTE_NIL() && ((bt->task == tt->this_task) ||
> > is_task_active(bt->task))) {
> >                  if (get_remote_regs(bt, &eip, &esp))
> >                          machdep->get_stack_frame(bt, &eip, &esp);
> >          } else
> Clearly I need to adjust the patch order.
> > If I'm not mistaken, it would seem to be impossible for "(bt->task ==
> > tt->this_task)"
> > to ever be true in the REMOTE_NIL() case?  As I recall tt->this_task used
> > to be the
> > PID of the remote daemon itself, which would be accessing the memory of its
> > own kernel.
> > So how could the pid of the remote daemon ever be the target of a back
> > trace of a
> > paused guest?  And so I don't understand why you first made an if clause
> > for
> > REMOTE_ACTIVE() to begin with -- whatever that means now -- and then
> > changed it to
> > REMOTE_NIL()?
> This came about from some old changes from 2012 to a crash version 4.0 for
> other work.  I had not thought of /dev/xenmem at the time, and just copied
> the condition from above.  I see no reason not to drop "bt->task ==
> tt->this_task".  Will check that the right actions are still done.
> > In any case, since the ultimate change is only interested in REMOTE_NIL(),
> > can you make it all less of an eyesore by encapulating its if case
> > something like:
> >
> >          else if (SADUMP_DUMPFILE())
> >                  get_sadump_regs(bt, &eip, &esp);
> > +       else if (REMOTE_NIL()) {
> > +               if (!((bt->task == tt->this_task) ||
> > is_task_active(bt->task)) ||
> > +                   !get_remote_regs(bt, &eip, &esp))
> > +                       machdep->get_stack_frame(bt, &eip, &esp);
> >          } else
> >                  machdep->get_stack_frame(bt, &eip, &esp);
> >
> > Note above, for consistency's sake with the rest of the crash utility,
> > can you make get_remote_regs() return non-zero when it successfully gets
> > the registers, and zero if it fails?
> Sure.
> > And lastly, can you fix these?
> >
> > $ make warn
> > ...
> > remote.c: In function â€˜validate_phys_baseâ€™:
> > remote.c:2302:25: warning: format â€˜%llxâ€™ expects argument of type
> > â€˜long long unsigned intâ€™, but argument 3 has type â€˜physaddr_tâ€™
> > [-Wformat]
> > remote.c:2302:25: warning: format â€˜%llxâ€™ expects argument of type
> > â€˜long long unsigned intâ€™, but argument 4 has type â€˜physaddr_tâ€™
> > [-Wformat]
> > remote.c:2302:25: warning: format â€˜%llxâ€™ expects argument of type
> > â€˜long long unsigned intâ€™, but argument 5 has type â€˜physaddr_tâ€™
> > [-Wformat]
> > remote.c:2302:25: warning: format â€˜%llxâ€™ expects argument of type
> > â€˜long long unsigned intâ€™, but argument 6 has type â€˜physaddr_tâ€™
> > [-Wformat]
> > remote.c: In function â€˜remote_vtopâ€™:
> > remote.c:2361:2: warning: format â€˜%llxâ€™ expects argument of type
> > â€˜long long unsigned intâ€™, but argument 4 has type â€˜physaddr_tâ€™
> > [-Wformat]
> > remote.c:2367:4: warning: format â€˜%llxâ€™ expects argument of type
> > â€˜long long unsigned intâ€™, but argument 4 has type â€˜physaddr_tâ€™
> > [-Wformat]
> > remote.c:2352:18: warning: variable â€˜p3â€™ set but not used
> > [-Wunused-but-set-variable]
> > remote.c:2352:13: warning: variable â€˜p2â€™ set but not used
> > [-Wunused-but-set-variable]
> > remote.c:2352:8: warning: variable â€˜p1â€™ set but not used
> > [-Wunused-but-set-variable]
> > remote.c: In function â€˜get_remote_regsâ€™:
> > remote.c:2392:33: warning: unused variable â€˜p6â€™ [-Wunused-variable]
> > remote.c:2392:28: warning: variable â€˜p5â€™ set but not used
> > [-Wunused-but-set-variable]
> > remote.c:2392:18: warning: variable â€˜p3â€™ set but not used
> > [-Wunused-but-set-variable]
> > remote.c:2392:8: warning: variable â€˜p1â€™ set but not used
> > [-Wunused-but-set-variable]
> > remote.c: In function â€˜get_remote_cr3â€™:
> > remote.c:2449:13: warning: variable â€˜p2â€™ set but not used
> > [-Wunused-but-set-variable]
> > remote.c:2449:8: warning: variable â€˜p1â€™ set but not used
> > [-Wunused-but-set-variable]
> > ...
> Yes.
> 
> > And I'm not interested in appending these to the 4 files that you changed:
> >
> > +/*
> > + * Specify Emacs local variables so the formating
> > + * of the code stays the same.
> > + *
> > + * Local variables:
> > + * mode: C
> > + * c-set-style: "BSD"
> > + * c-basic-offset: 8
> > + * indent-tabs-mode: t
> > + * End:
> > + */
> >
> > Thanks,
> >    Dave
> Ok, that is 2 votes to drop this, so I will.
>      -Don Slutz
> 

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility