Re: [PATCH nft] json: collapse set element commands from parser

Phil Sutter <phil@xxxxxx> · Wed, 13 Nov 2024 15:35:48 +0100

On Wed, Nov 13, 2024 at 12:01:07PM +0100, Pablo Neira Ayuso wrote:
> Hi Phil,
> 
> On Tue, Nov 12, 2024 at 09:52:35PM +0100, Phil Sutter wrote:
> > Hi Pablo,
> > 
> > On Thu, Oct 31, 2024 at 11:04:11PM +0100, Pablo Neira Ayuso wrote:
> > > Side note: While profiling, I can still see lots json objects, this
> > > results in memory consumption that is 5 times than native
> > > representation. Error reporting is also lagging behind, it should be
> > > possible to add a json_t pointer to struct location to relate
> > > expressions and json objects.
> > 
> > I can't quite reproduce this. When restoring a ruleset with ~12.7k
> > elements in individual standard syntax commands, valgrind prints:
> > 
> > | HEAP SUMMARY:
> > |     in use at exit: 59,802 bytes in 582 blocks
> > |   total heap usage: 954,970 allocs,
> > |                     954,388 frees,
> > |                  18,300,874 bytes allocated
> > 
> > Repeating the same in JSON syntax, I get:
> > 
> > | HEAP SUMMARY:
> > |     in use at exit: 61,592 bytes in 647 blocks
> > |   total heap usage: 1,200,164 allocs,
> > |                     1,199,517 frees,
> > |                    38,612,257 bytes allocated
> > 
> > So this is 38MB vs 18MB? At least far from the mentioned 5 times. Would
> > you mind sharing how you got to that number?
> > 
> > Please kindly find my reproducers attached for reference.
> 
> I am using valgrind --tool=massif to measure memory consumption in
> userspace.
> 
> I used these two files:
> 
> - set-init.json-nft, to create the table and set.
> - set-65535.nft-json, to create a small set with 64K elements.
> 
> then I run:
> 
> valgrind --tool=massif nft -f set-65535.nft-json
> 
> there is a tool:
> 
> ms_print massif.out.XYZ

Thanks! I see it now. Interestingly, I had tried feeding the ruleset on
stdin and that makes standard syntax use more memory, as well. With the
rulesets being read from a file, standard syntax indeed requires just
7MB while JSON uses 35MB.

> At "peak time" in heap memory consumption, I can see 60% is consumed
> in json objects.

The problem with jansson in that regard is that it parses the whole
thing recursively. In theory it would be possible to parse just the
outer object and continue parsing array elements by the time they are
accessed.

Interestingly, I managed to reduce memory consumption by 30% by
inserting a json_decref() call here:

| @@ -3496,6 +3498,7 @@ static struct cmd *json_parse_cmd_add_element(struct json_ctx *ctx,
|         h.set.name = xstrdup(h.set.name);
|  
|         expr = json_parse_set_expr(ctx, "elem", tmp);
| +       json_decref(tmp);
|         if (!expr) {
|                 json_error(ctx, "Invalid set.");
|                 handle_free(&h);

This does not fix a memleak, though: 'tmp' is assigned by a call to
json_unpack(... "s:o" ...) and thus does not have its reference
incremented. So AIUI, we're causing parts of the JSON object tree to be
freed and later accesses are problematic: e.g. --echo mode will abort
with "corrupted double-linked list" error.

> I am looking at the commands and expressions to reduce memory
> consumption there. The result of that work will also help json
> support.

Cheers, Phil