On 06/29/2018 11:29 AM, David Mair wrote: > On 06/26/2018 08:34 AM, Jeremy Harris wrote: >> On 06/26/2018 03:29 PM, David Wysochanski wrote: >>> On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote: >>>> Yes, by default all list entries encountered are put in the built-in >>>> hash queue, specifically for the purpose of determining whether there >>>> are duplicate entries. So if it's still running, it hasn't found any. >>>> >>>> To avoid the use of the hashing feature, try entering "set hash off" >>>> before kicking off the command. But of course if it finds any, it >>>> will loop forever. >>>> >>> >>> Ah ok yeah I forgot about the built-in list loop detection! >> >> For a storage-less method of list loop-detection: run two walkers >> down the list, advancing two versus one elements. If you ever >> match the same element location after starting, you have a loop. > > I experimented with this idea in userspace C++ yesterday to explore the > effect of different step sizes and ratios. > > ... > > The difference is because of how much waste the long step walker does > within the loop before it has a short step walker to find. You can try > other values but it pans out in the implied directions. The lowest > overhead to reach the point where a loop is even detectable is where the > long step and short step have only a very small difference between them > and when the ratio of long step to short step is as close as possible to > 1 and such that the small step has only a small chance of being larger > than the left margin (which you can't know until you find the loop), so > small but similar values for each are the best compromise. Taking > million node lists with a left margin of five hundred thousand the > overheads for a few cases of the ratio of long step to short step become: > > Ratio Overhead > (long step : short step) > 2:1 1.50M > 4:1 2.50M > 10:1 55.00M > 3:2 1.25M > 5:4 1.125M > 99:98 0.505M Corrections for two errors, both walkers have to do the left margin so the total steps for two walkers can't be less than 1M for the described loop and at 10:1 the overhead isn't as big as above: 10:1 5.500M 99:98 1.005M > > ... > ...or to put it another way. The wasted time spent by the long step walker iterating the loop when the short step walker has yet to reach the loop is: left margin * ((big step - short step) / short step) Meaning for the same ratios in the same loop case the big walker wasted effort in the loop is: Ratio Overhead (additional steps by long walker) (long step : short step) 2:1 500k 4:1 1.50M 10:1 4.50M 3:2 250k 5:4 125k 99:98 5102 Then, when both walkers are withing the loop the long step walker is guaranteed to walk the loop at least once. It's already stepping the loop before the short step walker can reach the loop. Meaning the Overhead (additional step...) cases above are also "saved effort" when the loop is longer than them because they are progress into the required loop completion by the long step walker before it can find the short step walker. So, for very long loops very high ratios of long step to short step with very small values of short step are fastest for finding the match when both walkers are within the loop. For very short loops it probably doesn't make much difference but if the match is only being checked for in one of the walkers then it's probably best if it's the long step walker and that the short step walker is as close as possible to one (minimizes how far into the loop or number of iterations of thew loop the short step walker can be at when it is found). Since the size of the loop isn't guaranteed to be long or short for each of a set of cases the optimal approach is probably to favor the smallest size for the short step and one greater than that for the long step. -- David. -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility