Re: RFC: string search for crash

Dave Anderson <anderson@xxxxxxxxxx> · Wed, 2 Feb 2011 15:28:01 -0500 (EST)

----- Original Message -----
> On Wed, 2011-02-02 at 14:49 +0000, Dave Anderson wrote:
> >
> > ----- Original Message -----
> > > Here is a patch to add string search to crash-5.1.1.
> > >
> > > It requires the my previous patch for the parse_line routine.
> > >
> > > It searches for the specified strings, and for half or more of the
> > > string appearing at the start and end of search blocks (usually pages)
> > > in case the string spans a page boundary.
> > >
> 
> > First, this is a really nifty feature...
> 
> Thanks.
> 
> > Second, I appreciate that you created a new string_search() function
> > instead of attempting to merge it into the existing search()
> > function.
> 
> Way too ugly otherwise.
> 
> >
> > Third, given that you're searching for specific string, why not show
> > the actual byte-aligned address as the starting point, and then just
> > display the string contents until the next non-ASCII character, or some
> > other delimiter like a 48 byte limit or whatever?
> 
> That's actually the easier thing to do and just a preference decision.
> I like looking at aligned addresses for using with subsequent rd
> commands. And it's also sometimes useful to see some pre-string context.
> But rounding to the previous long aligned address is really random with
> respect to pre-string context. There's probably a better way to provide
> that.

Actually, rounding back down to the previous long-aligned address might
be reasonable.

> 
> > I understand that the page-crossing issue is a PITA, but with respect
> > to a user searching memory for strings, the page-crossing issue is
> > seemingly irrelevant. In other words, why this:
> >
> > > ffff880125f72000: ger_event.......................................
> >
> > instead of displaying something like this:
> >
> >   ffff880125f71ffd: danger_event
> >
> > So I guess I'm just wondering why the "dan" at the end of the page
> > cannot be displayed? It just doesn't seem user-friendly to force
> > the user to understand the half-of-the-string-at-a-page-boundary
> > business. Since you do recognize that string exists and crosses
> > a page boundary, shouldn't you be able to display the first part?
> 
> I started working on my search stuff directly on dumpfiles, which meant
> I was searching through a collection of physical pages. When searching
> physical pages, there is no reason to believe that the end of page N has
> anything logically to do with the beginning of page N+1. So looking
> across to the next page for the continuation of a string didn't seem
> strictly correct or useful. (This was never even a consideration when
> searching for a long or int; they don't cross page boundaries.)

Right, certainly it doesn't make sense with physical pages, and to a
somewhat lesser extent, with unity-mapped pages.  But with user virtual,
the mapped kernel virtual region (x86_64 and ia64), and vmalloc addresses,
strings would cross page boundaries.  So I wonder whether it's worth the
effort to do something like I suggest since you always know what kind of
memory is being searched?  I know, I know, it's easy for *me* to say...  

But that leads to another issue.  As I mentioned before, the -p
implementation I've got uses its own function, so to marry that
with the string search capability kind of mucks with your patch.
I'm wondering whether your string search function could be called
from both the virtual and physical search functions near the bottom
where the page is searched?  Just something to think about. 

> 
> If I'm searching by virtual address, then N+1 does follow N, but I don't
> know that at the time I'm working on page N. The current search
> strategy is "choose a page, load the page, search the page". So if I
> find a promising first part of a string at the end of a page, the search
> loop would need a memory to carry my position in the string(s) across to
> the next page, if I can determine that the next page follows
> contiguously in the current search type. Not impossible, but not
> trivial, I think. Possibly easier to pre-read the following page to the
> length of the longest search string onto the end of an extended page
> buffer and then search for the full string to the end of that extended
> page buffer before loading the next page (and part of the one after
> that). But you still have to deal with the possiblity that the "next"
> page isn't contiguous to the one you're working on.

Right, perhaps the "last-page-contents-and-its-address" could always be saved?
And you could always attempt to read the next page if you see the beginning
of a search string.

> 
> When you called out the example above, a first-of-page hit on
> "ger_event" when searching for the string "danger", my first thought was
> that you had found a bug. The "dan" at the end of the previous page
> should also have met the criteria for half of the string "danger", and
> showed up as an end-of-page hit. But (whew!), here's what's in memory
> there:
> 
> crash-5.1.1str> rd ffff880125f71ff0 8
> ffff880125f71ff0: 000000000ed7ab73 676972745f64656c s.......led_trig
> ffff880125f72000: 6e6576655f726567 0000000000000074 ger_event.......
> ffff880125f72010: 0000000000000000 0000000000000000 ................
> ffff880125f72020: 0000000000000000 0000000000000000 ................
> 
> Which serves as a reminder that the "part of a string" match is just
> there to suggest possibilities that might turn out to be false
> positives.
> 
> > Lastly, I'm still planning to add the remaining "search" command
> > updates for the -KVM flags and the "missing" x86_64 segment bug,
> > and so I may need you to re-work this patch to fit into an updated
> > version of search(). I've been delayed getting crash-5.1.2 out
> > the door because of all of the recent patch postings, and it may
> > be worth waiting to get this piece in until 5.1.3 -- is that OK
> > with you?
> 
> No rush.

OK good.

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility