configuring list debugging and "poisoning" list pointers

"Robert P. J. Day" <rpjday@xxxxxxxxxxxxxx> · Sun, 21 Nov 2010 05:45:12 -0500 (EST)

  continuing my journey into the depths of kernel data structures, i'm
curious about the design and usage of list "poisoning".

  first, here are the magic values used to poison list node pointers,
defined in poison.h:

#define LIST_POISON1  ((void *) 0x00100100 + POISON_POINTER_DELTA)
#define LIST_POISON2  ((void *) 0x00200200 + POISON_POINTER_DELTA)

doesn't really matter what those value are, they're just magic and
immediately identifiable values.  and they're used as the new values
for the prev and next pointers of list nodes that are, among other
things, removed from a list.  this is from list.h:

#ifndef CONFIG_DEBUG_LIST
static inline void list_del(struct list_head *entry)
{
        __list_del(entry->prev, entry->next);
        entry->next = LIST_POISON1;
        entry->prev = LIST_POISON2;
}
#else
extern void list_del(struct list_head *entry);
#endif

but that's the first place where i get a bit confused.

  the obvious rationale behind list poisoning is to set the prev and
next pointers for invalid nodes to magic values so that they can't
possibly be mistaken for *valid* list nodes.  thus, in the above, when
a node is "deleted" from a list, its pointers are immediately rendered
invalid.

  but hang on.  list poisoning would seem to be a debugging technique
and, as you can see above, if you define CONFIG_DEBUG_LIST as a
compile-time kernel parameter, what you get is an external reference
to a list_del() function which is defined in lib/list_debug.c where,
as opposed to the normal implementation of list_del(), there is a good
deal more debugging going on:

void list_del(struct list_head *entry)
{
        WARN(entry->next == LIST_POISON1,
                "list_del corruption, next is LIST_POISON1 (%p)\n",
                LIST_POISON1);
        WARN(entry->next != LIST_POISON1 && entry->prev == LIST_POISON2,
                "list_del corruption, prev is LIST_POISON2 (%p)\n",
                LIST_POISON2);
        WARN(entry->prev->next != entry,
                "list_del corruption. prev->next should be %p, "
                "but was %p\n", entry, entry->prev->next);
        WARN(entry->next->prev != entry,
                "list_del corruption. next->prev should be %p, "
                "but was %p\n", entry, entry->next->prev);
        __list_del(entry->prev, entry->next);
        entry->next = LIST_POISON1;
        entry->prev = LIST_POISON2;
}

that looks reasonable, but here's the confusing part -- even *without*
configuring the kernel for list debugging, the *normal* definition of
list_del() is:

static inline void list_del(struct list_head *entry)
{
        __list_del(entry->prev, entry->next);
        entry->next = LIST_POISON1;
        entry->prev = LIST_POISON2;
}

in other words, even if you *don't* configure for list debugging,
list_del() will still poison the pointers.  why?

  as a first guess, is there anything else in list.h that will take
advantage of those poison values?

$ grep LIST_POISON list.h
	entry->next = LIST_POISON1;
	entry->prev = LIST_POISON2;
	n->next = LIST_POISON1;
	n->pprev = LIST_POISON2;
$

doesn't look like it -- while there's a bit more of *setting* poison
values, there's no *testing*.  so the question -- if i've explicitly
*not* selected list debugging with CONFIG_DEBUG_LIST when building my
kernel, why does a regular list_del() poison those pointers?

  let's do one more test -- let's search for LIST_POISON test
throughout the entire drivers/ directory:

$ grep -r -A2 LIST_POISON drivers
drivers/usb/host/xhci-hub.c:		if (cmd->cmd_list.next != LIST_POISON1)
drivers/usb/host/xhci-hub.c-			list_del(&cmd->cmd_list);
drivers/usb/host/xhci-hub.c-		spin_unlock_irqrestore(&xhci->lock, flags);
--
drivers/usb/host/xhci.c:		if (reset_device_cmd->cmd_list.next != LIST_POISON1)
drivers/usb/host/xhci.c-			list_del(&reset_device_cmd->cmd_list);
drivers/usb/host/xhci.c-		spin_unlock_irqrestore(&xhci->lock, flags);
$

ok, that's kind of weird.  two whole hits, both of them just
*checking* for a poison value, but not making a big deal out of it,
and not even surrounded by a CONFIG_DEBUG_LIST test.  why?

  this inspires a couple questions.  if i don't select list debugging
as a kernel config option, should there be any list poisoning setting
or testing *at all*?  and should non-list code (like that USB code) be
checking for those magic values?  and if it finds those values, does
that not constitute a signal that something has gone very wrong?

  can anyone clarify this?  i might ask this on the LKML shortly.

rday

-- 

========================================================================
Robert P. J. Day                               Waterloo, Ontario, CANADA
                        http://crashcourse.ca

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ