Re: [PATCH] udevd: de-duplicate strings in rules

"Kay Sievers" <kay.sievers@xxxxxxxx> · Thu, 13 Nov 2008 20:03:42 +0100

On Thu, Nov 13, 2008 at 11:12, Alan Jenkins <alan-jenkins@xxxxxxxxxxxxxx> wrote:
> Kay Sievers wrote:
>> On Wed, 2008-11-12 at 20:48 +0000, Alan Jenkins wrote:
>>> Kay Sievers wrote:
>>>> On Wed, Nov 12, 2008 at 19:05, Alan Jenkins <alan-jenkins@xxxxxxxxxxxxxx> wrote:
>>>>> Kay Sievers wrote:
>>>>>> On Wed, Nov 12, 2008 at 17:50, Alan Jenkins <alan-jenkins@xxxxxxxxxxxxxx> wrote:
>>>>>>> Kay Sievers wrote:
>>>>>>>> On Tue, Nov 11, 2008 at 22:23, Kay Sievers <kay.sievers@xxxxxxxx> wrote:
>>>>>>>>> On Tue, Nov 11, 2008 at 21:20, Alan Jenkins <alan-jenkins@xxxxxxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> On my Ubuntu installation this removes 15k of duplicate strings,
>>>>>>>>>> using a temporary index of about 25k.
>>>>>>>>>>
>>>>>>>>> Great. That looks nice.
>>>>>>>>>
>>>>>>>>> Thats's the diff of the rule dump before and after the patch:
>>>>>>>>>  ...
>>>>>>>>>  -[] shrunk to 64896 bytes tokens (5408 * 12 bytes), 57298 bytes buffer
>>>>>>>>>  -[] dumping 5408 (64896 bytes) tokens, 5818 (57298 bytes) strings
>>>>>>>>>  +[] shrunk to 64896 bytes tokens (5408 * 12 bytes), 18204 bytes buffer
>>>>>>>>>  +[] used 40512 bytes of string index nodes (844 * 48 bytes)
>>>>>>>>>  +[] dumping 5408 (64896 bytes) tokens, 1369 (18204 bytes) strings
>>>>>>>>>
>>>>>>>> I split the nodes and the childs in two independent arrays, so we got
>>>>>>>> rid of the limit of 10 childs per node. I've got ~200 fully uses slots
>>>>>>>> with the huge rules set here. Unlimited childs in the index removes
>>>>>>>> another 3 kB of duplicates, and the temporary index seems also a bit
>>>>>>>> smaller:
>>>>>>>>   shrunk to 64896 bytes tokens (5408 * 12 bytes), 15324 bytes buffer
>>>>>>>>   used 29456 bytes for index (1076 * 16 bytes nodes, 1020 * 12 bytes
>>>>>>>> child links)
>>>>>>>>
>>>>>>>> Would be great, if you can check if it still works for you as expected. :)
>>>>>>>>
>>>>>>> Did you have a particular reason to keep the trie_root array?  Now
>>>>>>> there's no fixed limit on children, you could just use trie[1] as the
>>>>>>> root node.  Remove the special case for depth == 0.  And initialize it's
>>>>>>> value and length to 0, then you can remove the special case for len == 0.
>>>>>>>
>>>>>> No special reason, I thought about that too, but it was already 5am,
>>>>>> and I was unable to think it through. :)
>>>>>>
>>>>>> Sounds nice to do that, did you try already, have a patch?
>>>>>>
>>>>> No, sorry :).
>>>>>
>>>> Ah, now by looking at it, maybe the then needed linear search for the
>>>> key in the root is not as good as the plain root array index?
>>>>
>>> Mmm.  Ok, without the root array add_string() takes twice as long, which
>>> increases the total rules-loading time by 10%.  (user time measured by
>>> cachegrind).  Let's leave it.
>>
>> Hmm, now I liked the idea. :)
>>
>> How about this? It has only a single array again, and no root, and no
>> child limits. Seems to work fine, but, it looks somehow too simple
>> now. :)
>
> Simple is good.  It's no faster, but I shouldn't care about 10% load
> time - because the total is only 0.01 seconds.

Yeah, it's 10% of almost nothing here too. So I think it's fine.

> What matters is that you
> can understand it, and rewriting it yourself won't hurt :).

Heh.

> As I say, the len==0 special case is now redundant.  If you remove it,
> it'll get handled by the root node which represents the empty string.
>
>> @@ -460,37 +456,31 @@ static int add_string(struct udev_rules *rules, const char *str)
>>       if (len == 0)
>>               return 0;

Removed it.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html