Re: [PATCH 0/7] Upgrade the remote access to a xen domain.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/19/13 11:08, Dave Anderson wrote:

----- Original Message -----
From: Don Slutz <dslutz@xxxxxxxxxxx>

Currently crash 4.0 and later using the code:

http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html

There is some very limited documention on crashes remote ptotocol in:

http://lists.xen.org/archives/html/xen-devel/2013-11/msg02352.html

and some fixes in the attachment
0001-xen-crashd-Connect-crash-with-domain.patch
from the 1st link above.

How ever there are issues with the current code:

1) The remote protocol has a minor issue (which I have not been able to
    happen) based on the fact TCP/IP is stream based protocol.  This means
    a RECV or a SEND may not do the fully requested size of data.  In fact
    the current code assumes that the amount of data that a SEND is called
    with will all be read with a single RECV.

2) The most common mismatch between crash and older kernels is
    phys_base.  In the remote case, see if the remote server supports
    vitrual memory access, and if so, see if phys_base can be retreived.

3) crash assumes that the remote system is active and can not return
    currect IP and SP to do a better back trace.

4) enable a non-active mode of remote access.

This patch set attempts to fix these:

The fix for #1 I have called NIL mode (patch 1).
The fix for #2 uses get_remote_phys_base (patch 3).
The fix for #3 uses get_remote_regs (patch 5).
The fix for #4 uses special file /dev/xenmem (patch 7).
   It also REMOTE_NIL() to indicate "remote paused system".

Don Slutz (7):
   Add NIL mode to remote.
   remote_proc_version: NULL terminate passed buffer on error.
   Add get_remote_phys_base.
   Add remote_vtop.
   bt: get remote live registers if possible.
   Add get_remote_cr3
   Add support for non-live remote.

  defs.h   |   8 +-
  kernel.c |  25 +++-
  memory.c |  25 +++-
  remote.c | 462
  +++++++++++++++++++++++++++++++++++++++++++++++++++++----------
  x86_64.c |  24 ++++
  5 files changed, 465 insertions(+), 79 deletions(-)

--
1.8.4
Hi Don,

A few comments...

I don't really understand what entity the code is communicating with,
so for the most part, I'm primarily interested in making sure that your
patches in no way can affect the normal usage of the crash utility.

That being said, is the term "NIL mode" some kind of Xen-ism?
Nope, it is term I came up with to describe the protocol change.
  It wasn't
until I saw the display_sys_stats() function that I even understood that
REMOTE_NIL mean "(remote paused system)".  Anyway, if you made it something
like "REMOTE_PAUSED" or something like that, the code might make some sense
to the untrained eye.
Will do.  Looking back at it, I ended up overloading the term. :(

And it's not entirely clear to me what REMOTE_ACTIVE() means now?  It used
to imply communicating with a remote crash daeamon that accessed the live
system that the daemon was running on.  Has that meaning been changed to mean
that crash is communicating with a daemon that's accessing a "live" Xen guest
kernel?
The meaning is not as clear.  Since the existing versions of crash already have REMOTE_ACTIVE() and I wanted the Xen utility to be usable to them, in both cases the "live" Xen guest kernel is being accessed while it is paused.  I added REMOTE_PAUSED() to have a few changes to crash to correctly understand that the guest is paused. As part of this I add the fake file "/dev/xenmem".

I'm confused about these two changes, where you re-patch your own patch:

-        else
+        else if (REMOTE_ACTIVE() && ((bt->task == tt->this_task) || is_task_active(bt->task))) {
+               if (get_remote_regs(bt, &eip, &esp))
+                       machdep->get_stack_frame(bt, &eip, &esp);
+       } else
followed by:

-        else if (REMOTE_ACTIVE() && ((bt->task == tt->this_task) || is_task_active(bt->task))) {
+        else if (REMOTE_NIL() && ((bt->task == tt->this_task) || is_task_active(bt->task))) {
                 if (get_remote_regs(bt, &eip, &esp))
                         machdep->get_stack_frame(bt, &eip, &esp);
         } else
Clearly I need to adjust the patch order.
If I'm not mistaken, it would seem to be impossible for "(bt->task == tt->this_task)"
to ever be true in the REMOTE_NIL() case?  As I recall tt->this_task used to be the
PID of the remote daemon itself, which would be accessing the memory of its own kernel.
So how could the pid of the remote daemon ever be the target of a back trace of a
paused guest?  And so I don't understand why you first made an if clause for
REMOTE_ACTIVE() to begin with -- whatever that means now -- and then changed it to
REMOTE_NIL()?
This came about from some old changes from 2012 to a crash version 4.0 for other work.  I had not thought of /dev/xenmem at the time, and just copied the condition from above.  I see no reason not to drop "bt->task == tt->this_task".  Will check that the right actions are still done.
In any case, since the ultimate change is only interested in REMOTE_NIL(),
can you make it all less of an eyesore by encapulating its if case
something like:

         else if (SADUMP_DUMPFILE())
                 get_sadump_regs(bt, &eip, &esp);
+       else if (REMOTE_NIL()) {
+               if (!((bt->task == tt->this_task) || is_task_active(bt->task)) ||
+                   !get_remote_regs(bt, &eip, &esp))	
+                       machdep->get_stack_frame(bt, &eip, &esp);
         } else
                 machdep->get_stack_frame(bt, &eip, &esp);

Note above, for consistency's sake with the rest of the crash utility,
can you make get_remote_regs() return non-zero when it successfully gets
the registers, and zero if it fails?
Sure.
And lastly, can you fix these?

$ make warn
...
remote.c: In function ‘validate_phys_base’:
remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘physaddr_t’ [-Wformat]
remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘physaddr_t’ [-Wformat]
remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘physaddr_t’ [-Wformat]
remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 6 has type ‘physaddr_t’ [-Wformat]
remote.c: In function ‘remote_vtop’:
remote.c:2361:2: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘physaddr_t’ [-Wformat]
remote.c:2367:4: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘physaddr_t’ [-Wformat]
remote.c:2352:18: warning: variable ‘p3’ set but not used [-Wunused-but-set-variable]
remote.c:2352:13: warning: variable ‘p2’ set but not used [-Wunused-but-set-variable]
remote.c:2352:8: warning: variable ‘p1’ set but not used [-Wunused-but-set-variable]
remote.c: In function ‘get_remote_regs’:
remote.c:2392:33: warning: unused variable ‘p6’ [-Wunused-variable]
remote.c:2392:28: warning: variable ‘p5’ set but not used [-Wunused-but-set-variable]
remote.c:2392:18: warning: variable ‘p3’ set but not used [-Wunused-but-set-variable]
remote.c:2392:8: warning: variable ‘p1’ set but not used [-Wunused-but-set-variable]
remote.c: In function ‘get_remote_cr3’:
remote.c:2449:13: warning: variable ‘p2’ set but not used [-Wunused-but-set-variable]
remote.c:2449:8: warning: variable ‘p1’ set but not used [-Wunused-but-set-variable]
...
Yes.

And I'm not interested in appending these to the 4 files that you changed:

+/*
+ * Specify Emacs local variables so the formating
+ * of the code stays the same.
+ *
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 8
+ * indent-tabs-mode: t
+ * End:
+ */

Thanks,
   Dave
Ok, that is 2 votes to drop this, so I will.
    -Don Slutz

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility





[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux