On Mon, Jan 19, 2004 at 20:12:01 -0800, prasanna wakhare wrote: > Hi all, > My queation is, > why there are lists everywhere and not AVL or like > that data structures which give O(lgn) performance in > linux kernel. O(log(n)) performance IN WHAT TASK?! First, I really don't know what you mean by "performance", but I will assume, that you are talking about time complexity. Algorithm time complexity is a number of "primitive operations" (meaning whatever is appropriate for given problem class) carried out to compute a result for particular input. Complexity of algorithms for accessing data structures depends on the size of, that is number of objects in, given data structure and thus is given as a function of that size. There are several algorithms for access to each data structure -- INSERT, DELETE, FIND and perhaps others (eg. FINDMIN for heaps...) The complexity also depends on other factors. Thus we speak of complexity in the worst, average and best (the last one not being too useful) case. The complexity function is usualy complicated, so we usualy use the asymptotic upper bound for comparing algorithms. Definition: f(n) ~ O(g(n)) iff E(c in R) E(n0 in N) V(n > n0) f(n) < c*g(n) (E is exists, V is for-all, R is a set of real numbers and N a natural numbers). This definition is useful, but it's easy to see it's limitation -- it does not speak at all about values for n below some n0 (which might be very high) nor does it speak about value of c, which might also be high. Note: When we have an upper bound on the problem size, ALL operations become O(1), since the n now has an upper bound and so becomes constant, which get's hidden in c. Now, when we need a data structure for something, we might require one of two things: - Lowest average complexity over average sample of operations we will do, however rejecting solutions that are "too" bad in the worst case (this is what most programs, including Linux want). - For each operation we need to do, a sufficient worst-case complexity (only systems that are required to respond in bound time want this). In addition, we also consider spatial complexity (the amount of extra memory required) and the amount of work needed to code it. Now we look at operations we will do and estimate how often we will do them and what size we expect the data set to grow. Then we consider various structures and estimate how well they will perform in our case. It turns out that: 1) Lists are the only reasonable implementation of LIFOs and FIFOs. 2) If you only need to INSERT, DELETE and ITERATE, doubly-linked lists are the best solution (O(1), O(1), O(n) respectively, low constant). - eg. waitqueues only INSERT and then iterate over the whole list and discard it. 3) If the structure is small on average, lists beat anything else due to small constant. The smaller spatial complexity is also interesting. 4) If the structure is not used often, spatial complexity becomes more important. 5) If the structure is not used often, the coding complexity becomes most important. Now, the doubly-linked lists, as used in kernel, are very simple, elegant and buletproof. Thus the structures, that are large and accessed often are optimized -- memory blocks use AVL, filesystem objects use hashes (note, that in each bucked, a linked list is stored). The rest falls into one of the above categories -- either it does not need (fast) FIND, dont store large amount of data, or are simply not crossed on hot enough paths to be worth the extra effort of writing and *debugging* it. > except at some part in kernel like the PID are > retrived from hash table and AVL i think at 1 or 2 > places otherwise all there are liste and not trees For average cases, the following holds: FIND INSERT DELETE* ITERATE Hash O(1)** O(1) O(1) O(n) AVL tree O(ln(n))O(ln(n))O(ln(n))O(n) Unsorted list O(n) O(1) O(1) O(n) Sorted list O(n) O(n) O(1) O(n) * DELETE does NOT include FIND. ** Wors-case is O(n), however. In addition, implementing an AVL tree is quite error-prone. Hashes are simpler (only the choice of hash function is complicated). Lists are really trivial. So trees turn out not to be the most useful structure in the universe. They are useful mainly when worst-case is important or good hashing function can't be found. ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> -- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/