Re: linkedlists?

Jan Hudec <bulb@ucw.cz> · Tue, 20 Jan 2004 21:33:55 +0100

On Mon, Jan 19, 2004 at 20:12:01 -0800, prasanna wakhare wrote:
> Hi all,
> My queation is,
> why there are lists everywhere and not AVL or like
> that data structures which give O(lgn) performance in
> linux kernel.

O(log(n)) performance IN WHAT TASK?!

First, I really don't know what you mean by "performance", but I will
assume, that you are talking about time complexity.

Algorithm time complexity is a number of "primitive operations" (meaning
whatever is appropriate for given problem class) carried out to compute
a result for particular input.

Complexity of algorithms for accessing data structures depends on the
size of, that is number of objects in, given data structure and thus is
given as a function of that size.

There are several algorithms for access to each data structure --
INSERT, DELETE, FIND and perhaps others (eg. FINDMIN for heaps...)

The complexity also depends on other factors. Thus we speak of
complexity in the worst, average and best (the last one not being too
useful) case.

The complexity function is usualy complicated, so we usualy use the
asymptotic upper bound for comparing algorithms.

Definition: f(n) ~ O(g(n)) iff E(c in R) E(n0 in N) V(n > n0) f(n) < c*g(n)
(E is exists, V is for-all, R is a set of real numbers and N a natural
numbers).

This definition is useful, but it's easy to see it's limitation -- it
does not speak at all about values for n below some n0 (which might be
very high) nor does it speak about value of c, which might also be high.

 Note: When we have an upper bound on the problem size, ALL operations
  become O(1), since the n now has an upper bound and so becomes
  constant, which get's hidden in c.

Now, when we need a data structure for something, we might require one
of two things:
    - Lowest average complexity over average sample of operations we
      will do, however rejecting solutions that are "too" bad in the
      worst case (this is what most programs, including Linux want). 
    - For each operation we need to do, a sufficient worst-case
      complexity (only systems that are required to respond in bound
      time want this).

In addition, we also consider spatial complexity (the amount of extra
memory required) and the amount of work needed to code it.

Now we look at operations we will do and estimate how often we will do
them and what size we expect the data set to grow.

Then we consider various structures and estimate how well they will
perform in our case. It turns out that:

1) Lists are the only reasonable implementation of LIFOs and FIFOs.
2) If you only need to INSERT, DELETE and ITERATE, doubly-linked lists
   are the best solution (O(1), O(1), O(n) respectively, low constant).
    - eg. waitqueues only INSERT and then iterate over the whole list
      and discard it.
3) If the structure is small on average, lists beat anything else due to
   small constant. The smaller spatial complexity is also interesting.
4) If the structure is not used often, spatial complexity becomes more
   important.
5) If the structure is not used often, the coding complexity becomes
   most important.

Now, the doubly-linked lists, as used in kernel, are very simple,
elegant and buletproof.

Thus the structures, that are large and accessed often are optimized --
memory blocks use AVL, filesystem objects use hashes (note, that in each
bucked, a linked list is stored).

The rest falls into one of the above categories -- either it does not
need (fast) FIND, dont store large amount of data, or are simply not
crossed on hot enough paths to be worth the extra effort of writing and
*debugging* it.

> except at some part in kernel like the PID are
> retrived from hash table and AVL i think at 1 or 2
> places otherwise all there are liste and not trees

For average cases, the following holds:

		FIND	INSERT	DELETE*	ITERATE
Hash		O(1)**	O(1)	O(1)	O(n)
AVL tree	O(ln(n))O(ln(n))O(ln(n))O(n)
Unsorted list	O(n)	O(1)	O(1)	O(n)
Sorted list	O(n)	O(n)	O(1)	O(n)
*  DELETE does NOT include FIND.
** Wors-case is O(n), however.

In addition, implementing an AVL tree is quite error-prone. Hashes are
simpler (only the choice of hash function is complicated). Lists are
really trivial.

So trees turn out not to be the most useful structure in the universe.
They are useful mainly when worst-case is important or good hashing
function can't be found.

-------------------------------------------------------------------------------
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/