Re: [PATCH v5 10/12] mm: zswap: Allocate pool batching resources if the crypto_alg supports batching.

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Tue, 7 Jan 2025 20:16:55 -0800

[..]
> >
> > > +       }
> > > +
> > > +       acomp_ctx->buffers = kmalloc_node(nr_reqs * sizeof(u8 *),
> > GFP_KERNEL, cpu_to_node(cpu));
> >
> > Can we use kcalloc_node() here?
>
> I was wondering if the performance penalty of the kcalloc_node() is acceptable
> because the cpu onlining happens infrequently? If so, it appears zero-initializing
> the allocated memory will help in the cleanup code suggestion in your subsequent
> comment.

I don't think zeroing in this path would be a problem.

>
> >
> > > +       if (!acomp_ctx->buffers)
> > > +               goto buf_fail;
> > > +
> > > +       for (i = 0; i < nr_reqs; ++i) {
> > > +               acomp_ctx->buffers[i] = kmalloc_node(PAGE_SIZE * 2,
> > GFP_KERNEL, cpu_to_node(cpu));
> > > +               if (!acomp_ctx->buffers[i]) {
> > > +                       for (j = 0; j < i; ++j)
> > > +                               kfree(acomp_ctx->buffers[j]);
> > > +                       kfree(acomp_ctx->buffers);
> > > +                       ret = -ENOMEM;
> > > +                       goto buf_fail;
> > > +               }
> > > +       }
> > > +
> > > +       acomp_ctx->reqs = kmalloc_node(nr_reqs * sizeof(struct acomp_req
> > *), GFP_KERNEL, cpu_to_node(cpu));
> >
> > Ditto.
>
> Sure.
>
> >
> > > +       if (!acomp_ctx->reqs)
> > >                 goto req_fail;
> > > +
> > > +       for (i = 0; i < nr_reqs; ++i) {
> > > +               acomp_ctx->reqs[i] = acomp_request_alloc(acomp_ctx->acomp);
> > > +               if (!acomp_ctx->reqs[i]) {
> > > +                       pr_err("could not alloc crypto acomp_request reqs[%d]
> > %s\n",
> > > +                              i, pool->tfm_name);
> > > +                       for (j = 0; j < i; ++j)
> > > +                               acomp_request_free(acomp_ctx->reqs[j]);
> > > +                       kfree(acomp_ctx->reqs);
> > > +                       ret = -ENOMEM;
> > > +                       goto req_fail;
> > > +               }
> > >         }
> > > -       acomp_ctx->req = req;
> > >
> > > +       /*
> > > +        * The crypto_wait is used only in fully synchronous, i.e., with scomp
> > > +        * or non-poll mode of acomp, hence there is only one "wait" per
> > > +        * acomp_ctx, with callback set to reqs[0], under the assumption that
> > > +        * there is at least 1 request per acomp_ctx.
> > > +        */
> > >         crypto_init_wait(&acomp_ctx->wait);
> > >         /*
> > >          * if the backend of acomp is async zip, crypto_req_done() will wakeup
> > >          * crypto_wait_req(); if the backend of acomp is scomp, the callback
> > >          * won't be called, crypto_wait_req() will return without blocking.
> > >          */
> > > -       acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> > > +       acomp_request_set_callback(acomp_ctx->reqs[0],
> > CRYPTO_TFM_REQ_MAY_BACKLOG,
> > >                                    crypto_req_done, &acomp_ctx->wait);
> > >
> > > +       acomp_ctx->nr_reqs = nr_reqs;
> > >         return 0;
> > >
> > >  req_fail:
> > > +       for (i = 0; i < nr_reqs; ++i)
> > > +               kfree(acomp_ctx->buffers[i]);
> > > +       kfree(acomp_ctx->buffers);
> >
> > The cleanup code is all over the place. Sometimes it's done in the
> > loops allocating the memory and sometimes here. It's a bit hard to
> > follow. Please have all the cleanups here. You can just initialize the
> > arrays to 0s, and then if the array is not-NULL you can free any
> > non-NULL elements (kfree() will handle NULLs gracefully).
>
> Sure, if performance of kzalloc_node() is an acceptable trade-off for the
> cleanup code simplification.
>
> >
> > There may be even potential for code reuse with zswap_cpu_comp_dead().
>
> I assume the reuse will be through copy-and-paste the same lines of code as
> against a common procedure being called by zswap_cpu_comp_prepare()
> and zswap_cpu_comp_dead()?

Well, I meant we can possibly introduce the helper that will be used
by both zswap_cpu_comp_prepare() and zswap_cpu_comp_dead() (for
example see __mem_cgroup_free() called from both the freeing path and
the allocation path to do cleanup).

I didn't look too closely into it though, maybe it's best to keep them
separate, depending on how the code ends up looking like.