Re: crash-7.3.2 very long list iteration progressively increasing memory usage

David Mair <dmair@xxxxxxxx> · Fri, 29 Jun 2018 11:29:41 -0600

On 06/26/2018 08:34 AM, Jeremy Harris wrote:
> On 06/26/2018 03:29 PM, David Wysochanski wrote:
>> On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote:
>>> Yes, by default all list entries encountered are put in the built-in
>>> hash queue, specifically for the purpose of determining whether there
>>> are duplicate entries.  So if it's still running, it hasn't found any.
>>>
>>> To avoid the use of the hashing feature, try entering "set hash off"
>>> before kicking off the command.  But of course if it finds any, it
>>> will loop forever.
>>>
>>
>> Ah ok yeah I forgot about the built-in list loop detection!
> 
> For a storage-less method of list loop-detection: run two walkers
> down the list, advancing two versus one elements.  If you ever
> match the same element location after starting, you have a loop.

I experimented with this idea in userspace C++ yesterday to explore the
effect of different step sizes and ratios.

There is a guaranteed overhead that is of optimal cost where the long
step and short step have only a very small difference between them and
when the ratio of long step to short step is as close as possible to 1
and such that the small step has only a small chance of being larger
than the number of nodes before entering the loop. IOW, 2:1, 3:2, 5:4
are improving cases of the ratio of one walker's steps per iteration to
the other's.

Assuming a clean list looks like:

H.....................T

Where H is the list head, T is the list tail and each . represents a
node with a next pointer the node on it's right and, if doubly-linked a
prev pointer to the node on it's left.

A loop source has to be later in the list than the loop target because a
case of a "loop source" earlier in the list than it's target doesn't
repeat, it's just a "leaked" set of nodes in that direction. For
example, a loop in the list above:

          _____
         v     |
H..............s......T

Where s is a node that has a next that is the node under the v (the loop
target).

So, with a pair of walkers the long step one runs away from the short
step one until the short step one enters the loop. Call the nodes
between H and the loop target the left margin. It's guaranteed that
there are this many node steps accumulated by the two walkers with
nothing achievable until the left margin is completed by the short step
walker:

((left margin / short step) * long step) + ((left margin / short step) *
short step)

Use L for left margin / short step (the number of short step executions
it takes for it to reach the loop target) and the cost to run through
the left margin is:

(L * long step) + (L * short step)

Which is can be re-organized as:

L * (long step + short step)

So, for 2:1 (long step:short step) for a few cases of L:

L     Left Margin Cost
2     6
5     15
10    30
100   300
1000  3000

For a case like 3:2 (long step:short step for the same cases of L (they
are half what they were for 2:1 because the short step is 2 not 1):

L     Left Margin Cost
1     5
3     15
5     25
50    250
500   2500

For a case like 3:1 (long step:short step for the same cases of L (they
are the same as the 2:1 case):

L     Left Margin Cost
2     8
5     20
10    40
100   400
1000  4000

The difference is because of how much waste the long step walker does
within the loop before it has a short step walker to find. You can try
other values but it pans out in the implied directions. The lowest
overhead to reach the point where a loop is even detectable is where the
long step and short step have only a very small difference between them
and when the ratio of long step to short step is as close as possible to
1 and such that the small step has only a small chance of being larger
than the left margin (which you can't know until you find the loop), so
small but similar values for each are the best compromise. Taking
million node lists with a left margin of five hundred thousand the
overheads for a few cases of the ratio of long step to short step become:

        Ratio               Overhead
(long step : short step)
          2:1                   1.50M
          4:1                   2.50M
         10:1                  55.00M
          3:2                   1.25M
          5:4                   1.125M
         99:98                  0.505M

After that the two walkers are in the loop and the overhead relates to
the size of the loop than the values of the step sizes for either walker.

-- 
David.

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility