Re: crypto: NULL deref in sha512_mb_mgr_get_comp_job_avx2

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Thu, 2 Feb 2017 11:58:17 +0100

On Wed, Feb 1, 2017 at 7:45 PM, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:
> On Tue, Jan 31, 2017 at 02:16:31PM +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> I am getting the following reports with low frequency while running
>> syzkaller fuzzer. Unfortunately they are not reproducible and happen
>> in a background thread, so it is difficult to extract any context on
>> my side. I see only few such crashes per week, so most likely it is
>> some hard to trigger data race. The following reports are from mmotm
>> tree, commits 00e20cfc2bf04a0cbe1f5405f61c8426f43eee84 and
>> fff7e71eac7788904753136f09bcad7471f7799e. Any ideas as to how this can
>> happen?
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
>> IP: [<ffffffff813fc09e>] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
>> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
>> PGD 1d2395067 [  220.874864] PUD 1d2860067
>> Oops: 0002 [#1] SMP KASAN
>> Dumping ftrace buffer:
>>    (ftrace buffer empty)
>> Modules linked in:
>> CPU: 0 PID: 516 Comm: kworker/0:1 Not tainted 4.9.0 #4
>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>> BIOS Google 01/01/2011
>> Workqueue: crypto mcryptd_queue_worker
>> task: ffff8801d9f346c0 task.stack: ffff8801d9f08000
>> RIP: 0010:[<ffffffff813fc09e>]  [<ffffffff813fc09e>]
>> sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
>> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
>> RSP: 0018:ffff8801d9f0eef8  EFLAGS: 00010202
>> RAX: 0000000000000000 RBX: ffff8801d7db1190 RCX: 0000000000000006
>> RDX: 0000000000000001 RSI: ffff8801d9f34ee8 RDI: ffff8801d7db1040
>> RBP: ffff8801d9f0f258 R08: 0000000100000000 R09: 0000000000000001
>> R10: 0000000000000002 R11: 0000000000000003 R12: ffff8801d9f0f230
>> R13: ffff8801c8bbc4e0 R14: ffff8801c8bbc530 R15: ffff8801d9f0ef70
>> FS:  0000000000000000(0000) GS:ffff8801dc000000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000060 CR3: 00000001cc15a000 CR4: 00000000001406f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Stack:
>>  ffff8801d7db1040 ffffffff813fa207 dffffc0000000000 ffffe8ffffc0f238
>>  0000000000000002 1ffff1003b3e1dea ffffe8ffffc0f218 ffff8801d9f0f190
>>  0000000000000282 ffffe8ffffc0f140 ffffe8ffffc0f220 0000000041b58ab3
>> Call Trace:
>>  [<ffffffff813fb407>] sha512_mb_update+0x2f7/0x4e0
>> arch/x86/crypto/sha512-mb/sha512_mb.c:588
>>  [<ffffffff8219d4ad>] crypto_ahash_update include/crypto/hash.h:512 [inline]
>>  [<ffffffff8219d4ad>] ahash_mcryptd_update crypto/mcryptd.c:627 [inline]
>>  [<ffffffff8219d4ad>] mcryptd_hash_update+0xcd/0x1c0 crypto/mcryptd.c:373
>>  [<ffffffff8219c99f>] mcryptd_queue_worker+0xff/0x6a0 crypto/mcryptd.c:181
>>  [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
>>  [<ffffffff81493bc3>] worker_thread+0x223/0x1990 kernel/workqueue.c:2230
>>  [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
>>  [<ffffffff8436fbaa>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
>> Code: 49 0f 42 d3 48 f7 c2 f0 ff ff ff 0f 85 9a 00 00 00 48 83 e2 0f
>> 48 6b da 08 48 8d 9c 1f 48 01 00 00 48 8b 03 48 c7 03 00 00 00 00 <c7>
>> 40 60 02 00 00 00 48 8b 9f 40 01 00 00 48 c1 e3 08 48 09 d3
>> RIP  [<ffffffff813fc09e>] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
>> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
>>  RSP <ffff8801d9f0eef8>
>> CR2: 0000000000000060
>> ---[ end trace 139fd4cda5dfe2c4 ]---
>>
>
> Dmitry,
>
> One theory that Mehga and I have is that perhaps the flusher
> and regular computaion updates are stepping on each other.
> Can you try this patch and see if it helps?

No, for this one I can't. Sorry.
It happens with very low frequency and only one fuzzer that tests
mmotm tree. If/when this is committed, I can keep an eye on these
reports and notify if I still see them.
If you have a hypothesis as to how it happens, perhaps you could write
a test that provokes the crash and maybe add some sleeps to kernel
code or alter timeouts to increase probability.

> --->8---
>
> From: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Subject: [PATCH] crypto/sha512-mb: Protect sha512 mb ctx mgr access
> To: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>, Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>, David Miller <davem@xxxxxxxxxxxxx>, linux-crypto@xxxxxxxxxxxxxxx, LKML <linux-kernel@xxxxxxxxxxxxxxx>, megha.dey@xxxxxxxxxxxxxxx, fenghua.yu@xxxxxxxxx
>
> The flusher and regular multi-buffer computation via mcryptd may race with another.
> Add here a lock and turn off interrupt to to access multi-buffer
> computation state cstate->mgr before a round of computation. This should
> prevent the flusher code jumping in.
>
> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> ---
>  arch/x86/crypto/sha512-mb/sha512_mb.c | 64 +++++++++++++++++++++++------------
>  1 file changed, 42 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/crypto/sha512-mb/sha512_mb.c b/arch/x86/crypto/sha512-mb/sha512_mb.c
> index d210174..f3c1c21 100644
> --- a/arch/x86/crypto/sha512-mb/sha512_mb.c
> +++ b/arch/x86/crypto/sha512-mb/sha512_mb.c
> @@ -221,7 +221,7 @@ static struct sha512_hash_ctx *sha512_ctx_mgr_resubmit
>  }
>
>  static struct sha512_hash_ctx
> -               *sha512_ctx_mgr_get_comp_ctx(struct sha512_ctx_mgr *mgr)
> +               *sha512_ctx_mgr_get_comp_ctx(struct mcryptd_alg_cstate *cstate)
>  {
>         /*
>          * If get_comp_job returns NULL, there are no jobs complete.
> @@ -233,11 +233,17 @@ static struct sha512_hash_ctx
>          * Otherwise, all jobs currently being managed by the hash_ctx_mgr
>          * still need processing.
>          */
> +       struct sha512_ctx_mgr *mgr;
>         struct sha512_hash_ctx *ctx;
> +       unsigned long flags;
>
> +       mgr = cstate->mgr;
> +       spin_lock_irqsave(&cstate->work_lock, flags);
>         ctx = (struct sha512_hash_ctx *)
>                                 sha512_job_mgr_get_comp_job(&mgr->mgr);
> -       return sha512_ctx_mgr_resubmit(mgr, ctx);
> +       ctx = sha512_ctx_mgr_resubmit(mgr, ctx);
> +       spin_unlock_irqrestore(&cstate->work_lock, flags);
> +       return ctx;
>  }
>
>  static void sha512_ctx_mgr_init(struct sha512_ctx_mgr *mgr)
> @@ -246,12 +252,17 @@ static void sha512_ctx_mgr_init(struct sha512_ctx_mgr *mgr)
>  }
>
>  static struct sha512_hash_ctx
> -                       *sha512_ctx_mgr_submit(struct sha512_ctx_mgr *mgr,
> +                       *sha512_ctx_mgr_submit(struct mcryptd_alg_cstate *cstate,
>                                           struct sha512_hash_ctx *ctx,
>                                           const void *buffer,
>                                           uint32_t len,
>                                           int flags)
>  {
> +       struct sha512_ctx_mgr *mgr;
> +       unsigned long irqflags;
> +
> +       mgr = cstate->mgr;
> +       spin_lock_irqsave(&cstate->work_lock, irqflags);
>         if (flags & (~HASH_ENTIRE)) {
>                 /*
>                  * User should not pass anything other than FIRST, UPDATE, or
> @@ -351,20 +362,26 @@ static struct sha512_hash_ctx
>                 }
>         }
>
> -       return sha512_ctx_mgr_resubmit(mgr, ctx);
> +       ctx = sha512_ctx_mgr_resubmit(mgr, ctx);
> +       spin_unlock_irqrestore(&cstate->work_lock, irqflags);
> +       return ctx;
>  }
>
> -static struct sha512_hash_ctx *sha512_ctx_mgr_flush(struct sha512_ctx_mgr *mgr)
> +static struct sha512_hash_ctx *sha512_ctx_mgr_flush(struct mcryptd_alg_cstate *cstate)
>  {
> +       struct sha512_ctx_mgr *mgr;
>         struct sha512_hash_ctx *ctx;
> +       unsigned long flags;
>
> +       mgr = cstate->mgr;
> +       spin_lock_irqsave(&cstate->work_lock, flags);
>         while (1) {
>                 ctx = (struct sha512_hash_ctx *)
>                                         sha512_job_mgr_flush(&mgr->mgr);
>
>                 /* If flush returned 0, there are no more jobs in flight. */
>                 if (!ctx)
> -                       return NULL;
> +                       break;
>
>                 /*
>                  * If flush returned a job, resubmit the job to finish
> @@ -378,8 +395,10 @@ static struct sha512_hash_ctx *sha512_ctx_mgr_flush(struct sha512_ctx_mgr *mgr)
>                  * the sha512_ctx_mgr still need processing. Loop.
>                  */
>                 if (ctx)
> -                       return ctx;
> +                       break;
>         }
> +       spin_unlock_irqrestore(&cstate->work_lock, flags);
> +       return ctx;
>  }
>
>  static int sha512_mb_init(struct ahash_request *areq)
> @@ -439,11 +458,11 @@ static int sha_finish_walk(struct mcryptd_hash_request_ctx **ret_rctx,
>                 sha_ctx = (struct sha512_hash_ctx *)
>                                                 ahash_request_ctx(&rctx->areq);
>                 kernel_fpu_begin();
> -               sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx,
> +               sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx,
>                                                 rctx->walk.data, nbytes, flag);
>                 if (!sha_ctx) {
>                         if (flush)
> -                               sha_ctx = sha512_ctx_mgr_flush(cstate->mgr);
> +                               sha_ctx = sha512_ctx_mgr_flush(cstate);
>                 }
>                 kernel_fpu_end();
>                 if (sha_ctx)
> @@ -471,11 +490,12 @@ static int sha_complete_job(struct mcryptd_hash_request_ctx *rctx,
>         struct sha512_hash_ctx *sha_ctx;
>         struct mcryptd_hash_request_ctx *req_ctx;
>         int ret;
> +       unsigned long flags;
>
>         /* remove from work list */
> -       spin_lock(&cstate->work_lock);
> +       spin_lock_irqsave(&cstate->work_lock, flags);
>         list_del(&rctx->waiter);
> -       spin_unlock(&cstate->work_lock);
> +       spin_unlock_irqrestore(&cstate->work_lock, flags);
>
>         if (irqs_disabled())
>                 rctx->complete(&req->base, err);
> @@ -486,14 +506,14 @@ static int sha_complete_job(struct mcryptd_hash_request_ctx *rctx,
>         }
>
>         /* check to see if there are other jobs that are done */
> -       sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate->mgr);
> +       sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate);
>         while (sha_ctx) {
>                 req_ctx = cast_hash_to_mcryptd_ctx(sha_ctx);
>                 ret = sha_finish_walk(&req_ctx, cstate, false);
>                 if (req_ctx) {
> -                       spin_lock(&cstate->work_lock);
> +                       spin_lock_irqsave(&cstate->work_lock, flags);
>                         list_del(&req_ctx->waiter);
> -                       spin_unlock(&cstate->work_lock);
> +                       spin_unlock_irqrestore(&cstate->work_lock, flags);
>
>                         req = cast_mcryptd_ctx_to_req(req_ctx);
>                         if (irqs_disabled())
> @@ -504,7 +524,7 @@ static int sha_complete_job(struct mcryptd_hash_request_ctx *rctx,
>                                 local_bh_enable();
>                         }
>                 }
> -               sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate->mgr);
> +               sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate);
>         }
>
>         return 0;
> @@ -515,6 +535,7 @@ static void sha512_mb_add_list(struct mcryptd_hash_request_ctx *rctx,
>  {
>         unsigned long next_flush;
>         unsigned long delay = usecs_to_jiffies(FLUSH_INTERVAL);
> +       unsigned long flags;
>
>         /* initialize tag */
>         rctx->tag.arrival = jiffies;    /* tag the arrival time */
> @@ -522,9 +543,9 @@ static void sha512_mb_add_list(struct mcryptd_hash_request_ctx *rctx,
>         next_flush = rctx->tag.arrival + delay;
>         rctx->tag.expire = next_flush;
>
> -       spin_lock(&cstate->work_lock);
> +       spin_lock_irqsave(&cstate->work_lock, flags);
>         list_add_tail(&rctx->waiter, &cstate->work_list);
> -       spin_unlock(&cstate->work_lock);
> +       spin_unlock_irqrestore(&cstate->work_lock, flags);
>
>         mcryptd_arm_flusher(cstate, delay);
>  }
> @@ -565,7 +586,7 @@ static int sha512_mb_update(struct ahash_request *areq)
>         sha_ctx = (struct sha512_hash_ctx *) ahash_request_ctx(areq);
>         sha512_mb_add_list(rctx, cstate);
>         kernel_fpu_begin();
> -       sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx, rctx->walk.data,
> +       sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx, rctx->walk.data,
>                                                         nbytes, HASH_UPDATE);
>         kernel_fpu_end();
>
> @@ -628,7 +649,7 @@ static int sha512_mb_finup(struct ahash_request *areq)
>         sha512_mb_add_list(rctx, cstate);
>
>         kernel_fpu_begin();
> -       sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx, rctx->walk.data,
> +       sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx, rctx->walk.data,
>                                                                 nbytes, flag);
>         kernel_fpu_end();
>
> @@ -677,8 +698,7 @@ static int sha512_mb_final(struct ahash_request *areq)
>         /* flag HASH_FINAL and 0 data size */
>         sha512_mb_add_list(rctx, cstate);
>         kernel_fpu_begin();
> -       sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx, &data, 0,
> -                                                               HASH_LAST);
> +       sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx, &data, 0, HASH_LAST);
>         kernel_fpu_end();
>
>         /* check if anything is returned */
> @@ -940,7 +960,7 @@ static unsigned long sha512_mb_flusher(struct mcryptd_alg_cstate *cstate)
>                         break;
>                 kernel_fpu_begin();
>                 sha_ctx = (struct sha512_hash_ctx *)
> -                                       sha512_ctx_mgr_flush(cstate->mgr);
> +                                       sha512_ctx_mgr_flush(cstate);
>                 kernel_fpu_end();
>                 if (!sha_ctx) {
>                         pr_err("sha512_mb error: nothing got flushed for"
> --
> 2.5.5
>