On Thu, 2017-02-02 at 11:58 +0100, Dmitry Vyukov wrote: > On Wed, Feb 1, 2017 at 7:45 PM, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote: > > > > On Tue, Jan 31, 2017 at 02:16:31PM +0100, Dmitry Vyukov wrote: > > > > > > Hello, > > > > > > I am getting the following reports with low frequency while running > > > syzkaller fuzzer. Unfortunately they are not reproducible and happen > > > in a background thread, so it is difficult to extract any context on > > > my side. I see only few such crashes per week, so most likely it is > > > some hard to trigger data race. The following reports are from mmotm > > > tree, commits 00e20cfc2bf04a0cbe1f5405f61c8426f43eee84 and > > > fff7e71eac7788904753136f09bcad7471f7799e. Any ideas as to how this can > > > happen? > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 > > > IP: [<ffffffff813fc09e>] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee > > > arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251 > > > PGD 1d2395067 [ 220.874864] PUD 1d2860067 > > > Oops: 0002 [#1] SMP KASAN > > > Dumping ftrace buffer: > > > (ftrace buffer empty) > > > Modules linked in: > > > CPU: 0 PID: 516 Comm: kworker/0:1 Not tainted 4.9.0 #4 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, > > > BIOS Google 01/01/2011 > > > Workqueue: crypto mcryptd_queue_worker > > > task: ffff8801d9f346c0 task.stack: ffff8801d9f08000 > > > RIP: 0010:[<ffffffff813fc09e>] [<ffffffff813fc09e>] > > > sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee > > > arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251 > > > RSP: 0018:ffff8801d9f0eef8 EFLAGS: 00010202 > > > RAX: 0000000000000000 RBX: ffff8801d7db1190 RCX: 0000000000000006 > > > RDX: 0000000000000001 RSI: ffff8801d9f34ee8 RDI: ffff8801d7db1040 > > > RBP: ffff8801d9f0f258 R08: 0000000100000000 R09: 0000000000000001 > > > R10: 0000000000000002 R11: 0000000000000003 R12: ffff8801d9f0f230 > > > R13: ffff8801c8bbc4e0 R14: ffff8801c8bbc530 R15: ffff8801d9f0ef70 > > > FS: 0000000000000000(0000) GS:ffff8801dc000000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000000000000060 CR3: 00000001cc15a000 CR4: 00000000001406f0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > Stack: > > > ffff8801d7db1040 ffffffff813fa207 dffffc0000000000 ffffe8ffffc0f238 > > > 0000000000000002 1ffff1003b3e1dea ffffe8ffffc0f218 ffff8801d9f0f190 > > > 0000000000000282 ffffe8ffffc0f140 ffffe8ffffc0f220 0000000041b58ab3 > > > Call Trace: > > > [<ffffffff813fb407>] sha512_mb_update+0x2f7/0x4e0 > > > arch/x86/crypto/sha512-mb/sha512_mb.c:588 > > > [<ffffffff8219d4ad>] crypto_ahash_update include/crypto/hash.h:512 [inline] > > > [<ffffffff8219d4ad>] ahash_mcryptd_update crypto/mcryptd.c:627 [inline] > > > [<ffffffff8219d4ad>] mcryptd_hash_update+0xcd/0x1c0 crypto/mcryptd.c:373 > > > [<ffffffff8219c99f>] mcryptd_queue_worker+0xff/0x6a0 crypto/mcryptd.c:181 > > > [<ffffffff81492960>] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096 > > > [<ffffffff81493bc3>] worker_thread+0x223/0x1990 kernel/workqueue.c:2230 > > > [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209 > > > [<ffffffff8436fbaa>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 > > > Code: 49 0f 42 d3 48 f7 c2 f0 ff ff ff 0f 85 9a 00 00 00 48 83 e2 0f > > > 48 6b da 08 48 8d 9c 1f 48 01 00 00 48 8b 03 48 c7 03 00 00 00 00 <c7> > > > 40 60 02 00 00 00 48 8b 9f 40 01 00 00 48 c1 e3 08 48 09 d3 > > > RIP [<ffffffff813fc09e>] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee > > > arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251 > > > RSP <ffff8801d9f0eef8> > > > CR2: 0000000000000060 > > > ---[ end trace 139fd4cda5dfe2c4 ]--- > > > > > Dmitry, > > > > One theory that Mehga and I have is that perhaps the flusher > > and regular computaion updates are stepping on each other. > > Can you try this patch and see if it helps? > > No, for this one I can't. Sorry. > It happens with very low frequency and only one fuzzer that tests > mmotm tree. If/when this is committed, I can keep an eye on these > reports and notify if I still see them. > If you have a hypothesis as to how it happens, perhaps you could write > a test that provokes the crash and maybe add some sleeps to kernel > code or alter timeouts to increase probability. > If this patch is merged, it will most likely go into Herbert's crypto-dev tree and not Andrew's mm tree for testing. We will try to do the best on our side for testing to replicate the crash scenario. Will it be possible to have one of your fuzzer run the crypto-dev tree once the patch got merged there? Thanks. Tim > > > > > > --->8--- > > > > From: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > > Subject: [PATCH] crypto/sha512-mb: Protect sha512 mb ctx mgr access > > To: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>, Dmitry Vyukov <dvyukov@xxxxxxxxxx> > > Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>, David Miller <davem@xxxxxxxxxxxxx>, linux-crypto@xxxxxxxxxxxxxxx, LKML <linux-kernel@xxxxxxxxxxxxxxx>, megha.dey@xxxxxxxxxxxxxxx, fenghua.yu@xxxxxxxxx > > > > The flusher and regular multi-buffer computation via mcryptd may race with another. > > Add here a lock and turn off interrupt to to access multi-buffer > > computation state cstate->mgr before a round of computation. This should > > prevent the flusher code jumping in. > > > > Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > > --- > > arch/x86/crypto/sha512-mb/sha512_mb.c | 64 +++++++++++++++++++++++------------ > > 1 file changed, 42 insertions(+), 22 deletions(-) > > > > diff --git a/arch/x86/crypto/sha512-mb/sha512_mb.c b/arch/x86/crypto/sha512-mb/sha512_mb.c > > index d210174..f3c1c21 100644 > > --- a/arch/x86/crypto/sha512-mb/sha512_mb.c > > +++ b/arch/x86/crypto/sha512-mb/sha512_mb.c > > @@ -221,7 +221,7 @@ static struct sha512_hash_ctx *sha512_ctx_mgr_resubmit > > } > > > > static struct sha512_hash_ctx > > - *sha512_ctx_mgr_get_comp_ctx(struct sha512_ctx_mgr *mgr) > > + *sha512_ctx_mgr_get_comp_ctx(struct mcryptd_alg_cstate *cstate) > > { > > /* > > * If get_comp_job returns NULL, there are no jobs complete. > > @@ -233,11 +233,17 @@ static struct sha512_hash_ctx > > * Otherwise, all jobs currently being managed by the hash_ctx_mgr > > * still need processing. > > */ > > + struct sha512_ctx_mgr *mgr; > > struct sha512_hash_ctx *ctx; > > + unsigned long flags; > > > > + mgr = cstate->mgr; > > + spin_lock_irqsave(&cstate->work_lock, flags); > > ctx = (struct sha512_hash_ctx *) > > sha512_job_mgr_get_comp_job(&mgr->mgr); > > - return sha512_ctx_mgr_resubmit(mgr, ctx); > > + ctx = sha512_ctx_mgr_resubmit(mgr, ctx); > > + spin_unlock_irqrestore(&cstate->work_lock, flags); > > + return ctx; > > } > > > > static void sha512_ctx_mgr_init(struct sha512_ctx_mgr *mgr) > > @@ -246,12 +252,17 @@ static void sha512_ctx_mgr_init(struct sha512_ctx_mgr *mgr) > > } > > > > static struct sha512_hash_ctx > > - *sha512_ctx_mgr_submit(struct sha512_ctx_mgr *mgr, > > + *sha512_ctx_mgr_submit(struct mcryptd_alg_cstate *cstate, > > struct sha512_hash_ctx *ctx, > > const void *buffer, > > uint32_t len, > > int flags) > > { > > + struct sha512_ctx_mgr *mgr; > > + unsigned long irqflags; > > + > > + mgr = cstate->mgr; > > + spin_lock_irqsave(&cstate->work_lock, irqflags); > > if (flags & (~HASH_ENTIRE)) { > > /* > > * User should not pass anything other than FIRST, UPDATE, or > > @@ -351,20 +362,26 @@ static struct sha512_hash_ctx > > } > > } > > > > - return sha512_ctx_mgr_resubmit(mgr, ctx); > > + ctx = sha512_ctx_mgr_resubmit(mgr, ctx); > > + spin_unlock_irqrestore(&cstate->work_lock, irqflags); > > + return ctx; > > } > > > > -static struct sha512_hash_ctx *sha512_ctx_mgr_flush(struct sha512_ctx_mgr *mgr) > > +static struct sha512_hash_ctx *sha512_ctx_mgr_flush(struct mcryptd_alg_cstate *cstate) > > { > > + struct sha512_ctx_mgr *mgr; > > struct sha512_hash_ctx *ctx; > > + unsigned long flags; > > > > + mgr = cstate->mgr; > > + spin_lock_irqsave(&cstate->work_lock, flags); > > while (1) { > > ctx = (struct sha512_hash_ctx *) > > sha512_job_mgr_flush(&mgr->mgr); > > > > /* If flush returned 0, there are no more jobs in flight. */ > > if (!ctx) > > - return NULL; > > + break; > > > > /* > > * If flush returned a job, resubmit the job to finish > > @@ -378,8 +395,10 @@ static struct sha512_hash_ctx *sha512_ctx_mgr_flush(struct sha512_ctx_mgr *mgr) > > * the sha512_ctx_mgr still need processing. Loop. > > */ > > if (ctx) > > - return ctx; > > + break; > > } > > + spin_unlock_irqrestore(&cstate->work_lock, flags); > > + return ctx; > > } > > > > static int sha512_mb_init(struct ahash_request *areq) > > @@ -439,11 +458,11 @@ static int sha_finish_walk(struct mcryptd_hash_request_ctx **ret_rctx, > > sha_ctx = (struct sha512_hash_ctx *) > > ahash_request_ctx(&rctx->areq); > > kernel_fpu_begin(); > > - sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx, > > + sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx, > > rctx->walk.data, nbytes, flag); > > if (!sha_ctx) { > > if (flush) > > - sha_ctx = sha512_ctx_mgr_flush(cstate->mgr); > > + sha_ctx = sha512_ctx_mgr_flush(cstate); > > } > > kernel_fpu_end(); > > if (sha_ctx) > > @@ -471,11 +490,12 @@ static int sha_complete_job(struct mcryptd_hash_request_ctx *rctx, > > struct sha512_hash_ctx *sha_ctx; > > struct mcryptd_hash_request_ctx *req_ctx; > > int ret; > > + unsigned long flags; > > > > /* remove from work list */ > > - spin_lock(&cstate->work_lock); > > + spin_lock_irqsave(&cstate->work_lock, flags); > > list_del(&rctx->waiter); > > - spin_unlock(&cstate->work_lock); > > + spin_unlock_irqrestore(&cstate->work_lock, flags); > > > > if (irqs_disabled()) > > rctx->complete(&req->base, err); > > @@ -486,14 +506,14 @@ static int sha_complete_job(struct mcryptd_hash_request_ctx *rctx, > > } > > > > /* check to see if there are other jobs that are done */ > > - sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate->mgr); > > + sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate); > > while (sha_ctx) { > > req_ctx = cast_hash_to_mcryptd_ctx(sha_ctx); > > ret = sha_finish_walk(&req_ctx, cstate, false); > > if (req_ctx) { > > - spin_lock(&cstate->work_lock); > > + spin_lock_irqsave(&cstate->work_lock, flags); > > list_del(&req_ctx->waiter); > > - spin_unlock(&cstate->work_lock); > > + spin_unlock_irqrestore(&cstate->work_lock, flags); > > > > req = cast_mcryptd_ctx_to_req(req_ctx); > > if (irqs_disabled()) > > @@ -504,7 +524,7 @@ static int sha_complete_job(struct mcryptd_hash_request_ctx *rctx, > > local_bh_enable(); > > } > > } > > - sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate->mgr); > > + sha_ctx = sha512_ctx_mgr_get_comp_ctx(cstate); > > } > > > > return 0; > > @@ -515,6 +535,7 @@ static void sha512_mb_add_list(struct mcryptd_hash_request_ctx *rctx, > > { > > unsigned long next_flush; > > unsigned long delay = usecs_to_jiffies(FLUSH_INTERVAL); > > + unsigned long flags; > > > > /* initialize tag */ > > rctx->tag.arrival = jiffies; /* tag the arrival time */ > > @@ -522,9 +543,9 @@ static void sha512_mb_add_list(struct mcryptd_hash_request_ctx *rctx, > > next_flush = rctx->tag.arrival + delay; > > rctx->tag.expire = next_flush; > > > > - spin_lock(&cstate->work_lock); > > + spin_lock_irqsave(&cstate->work_lock, flags); > > list_add_tail(&rctx->waiter, &cstate->work_list); > > - spin_unlock(&cstate->work_lock); > > + spin_unlock_irqrestore(&cstate->work_lock, flags); > > > > mcryptd_arm_flusher(cstate, delay); > > } > > @@ -565,7 +586,7 @@ static int sha512_mb_update(struct ahash_request *areq) > > sha_ctx = (struct sha512_hash_ctx *) ahash_request_ctx(areq); > > sha512_mb_add_list(rctx, cstate); > > kernel_fpu_begin(); > > - sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx, rctx->walk.data, > > + sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx, rctx->walk.data, > > nbytes, HASH_UPDATE); > > kernel_fpu_end(); > > > > @@ -628,7 +649,7 @@ static int sha512_mb_finup(struct ahash_request *areq) > > sha512_mb_add_list(rctx, cstate); > > > > kernel_fpu_begin(); > > - sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx, rctx->walk.data, > > + sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx, rctx->walk.data, > > nbytes, flag); > > kernel_fpu_end(); > > > > @@ -677,8 +698,7 @@ static int sha512_mb_final(struct ahash_request *areq) > > /* flag HASH_FINAL and 0 data size */ > > sha512_mb_add_list(rctx, cstate); > > kernel_fpu_begin(); > > - sha_ctx = sha512_ctx_mgr_submit(cstate->mgr, sha_ctx, &data, 0, > > - HASH_LAST); > > + sha_ctx = sha512_ctx_mgr_submit(cstate, sha_ctx, &data, 0, HASH_LAST); > > kernel_fpu_end(); > > > > /* check if anything is returned */ > > @@ -940,7 +960,7 @@ static unsigned long sha512_mb_flusher(struct mcryptd_alg_cstate *cstate) > > break; > > kernel_fpu_begin(); > > sha_ctx = (struct sha512_hash_ctx *) > > - sha512_ctx_mgr_flush(cstate->mgr); > > + sha512_ctx_mgr_flush(cstate); > > kernel_fpu_end(); > > if (!sha_ctx) { > > pr_err("sha512_mb error: nothing got flushed for" > > -- > > 2.5.5 > >