Hi Herbert, On Sun, Apr 14, 2019 at 08:03:38PM +0800, Herbert Xu wrote: > On Sun, Apr 14, 2019 at 04:26:04AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: e0a092eb Merge branch 'smc-next' > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=139878f3200000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=94c460c3e4cd188a > > dashboard link: https://syzkaller.appspot.com/bug?extid=6f72c20560060c98b566 > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+6f72c20560060c98b566@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > BUG: sleeping function called from invalid context at crypto/skcipher.c:477 > > in_atomic(): 1, irqs_disabled(): 0, pid: 12, name: kworker/0:1 > > 2 locks held by kworker/0:1/12: > > #0: 000000000864d9ff ((wq_completion)crypto){+.+.}, at: __write_once_size > > include/linux/compiler.h:220 [inline] > > #0: 000000000864d9ff ((wq_completion)crypto){+.+.}, at: arch_atomic64_set > > arch/x86/include/asm/atomic64_64.h:34 [inline] > > #0: 000000000864d9ff ((wq_completion)crypto){+.+.}, at: atomic64_set > > include/asm-generic/atomic-instrumented.h:855 [inline] > > #0: 000000000864d9ff ((wq_completion)crypto){+.+.}, at: atomic_long_set > > include/asm-generic/atomic-long.h:40 [inline] > > #0: 000000000864d9ff ((wq_completion)crypto){+.+.}, at: set_work_data > > kernel/workqueue.c:619 [inline] > > #0: 000000000864d9ff ((wq_completion)crypto){+.+.}, at: > > set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] > > #0: 000000000864d9ff ((wq_completion)crypto){+.+.}, at: > > process_one_work+0x87e/0x1790 kernel/workqueue.c:2240 > > #1: 000000008b3d6218 ((work_completion)(&cpu_queue->work)){+.+.}, at: > > process_one_work+0x8b4/0x1790 kernel/workqueue.c:2244 > > Preemption disabled at: > > [<ffffffff830d4130>] local_bh_disable include/linux/bottom_half.h:19 > > [inline] > > [<ffffffff830d4130>] cryptd_skcipher_complete+0x90/0x170 crypto/cryptd.c:471 > > CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4+ #135 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Workqueue: crypto cryptd_queue_worker > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > > ___might_sleep.cold+0x1bd/0x1f6 kernel/sched/core.c:6190 > > __might_sleep+0x95/0x190 kernel/sched/core.c:6143 > > skcipher_walk_virt+0x11e/0x150 crypto/skcipher.c:477 > > xor_tweak+0x146/0x350 crypto/xts.c:105 > > xor_tweak_post crypto/xts.c:133 [inline] > > crypt_done+0x87/0xa0 crypto/xts.c:141 > > cryptd_skcipher_complete+0xbf/0x170 crypto/cryptd.c:472 > > cryptd_skcipher_decrypt+0x2f7/0x420 crypto/cryptd.c:532 > > cryptd_queue_worker+0x126/0x1f0 crypto/cryptd.c:193 > > process_one_work+0x98e/0x1790 kernel/workqueue.c:2269 > > worker_thread+0x98/0xe40 kernel/workqueue.c:2415 > > kthread+0x357/0x430 kernel/kthread.c:253 > > ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 > > Thanks for the report. Please try this patch: > > ---8<--- > When we perform a walk in the completion function, we need to ensure > that it is atomic. > > Reported-by: syzbot+6f72c20560060c98b566@xxxxxxxxxxxxxxxxxxxxxxxxx > Fixes: 78105c7e769b ("crypto: xts - Drop use of auxiliary buffer") > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> > > diff --git a/crypto/xts.c b/crypto/xts.c > index 847f54f76789..c915a45711f5 100644 > --- a/crypto/xts.c > +++ b/crypto/xts.c > @@ -88,7 +88,8 @@ static int setkey(struct crypto_skcipher *parent, const u8 *key, > * mutliple calls to the 'ecb(..)' instance, which usually would be slower than > * just doing the gf128mul_x_ble() calls again. > */ > -static int xor_tweak(struct skcipher_request *req, bool second_pass) > +static int xor_tweak(struct skcipher_request *req, bool second_pass, > + bool atomic) > { > struct rctx *rctx = skcipher_request_ctx(req); > struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); > @@ -102,7 +103,7 @@ static int xor_tweak(struct skcipher_request *req, bool second_pass) > /* set to our TFM to enforce correct alignment: */ > skcipher_request_set_tfm(req, tfm); > } > - err = skcipher_walk_virt(&w, req, false); > + err = skcipher_walk_virt(&w, req, atomic); > > while (w.nbytes) { > unsigned int avail = w.nbytes; > @@ -125,12 +126,12 @@ static int xor_tweak(struct skcipher_request *req, bool second_pass) > > static int xor_tweak_pre(struct skcipher_request *req) > { > - return xor_tweak(req, false); > + return xor_tweak(req, false, false); > } > > -static int xor_tweak_post(struct skcipher_request *req) > +static int xor_tweak_post(struct skcipher_request *req, bool atomic) > { > - return xor_tweak(req, true); > + return xor_tweak(req, true, atomic); > } > > static void crypt_done(struct crypto_async_request *areq, int err) > @@ -138,7 +139,7 @@ static void crypt_done(struct crypto_async_request *areq, int err) > struct skcipher_request *req = areq->data; > > if (!err) > - err = xor_tweak_post(req); > + err = xor_tweak_post(req, true); > > skcipher_request_complete(req, err); > } > @@ -166,7 +167,7 @@ static int encrypt(struct skcipher_request *req) > init_crypt(req); > return xor_tweak_pre(req) ?: > crypto_skcipher_encrypt(subreq) ?: > - xor_tweak_post(req); > + xor_tweak_post(req, false); > } > > static int decrypt(struct skcipher_request *req) > @@ -177,7 +178,7 @@ static int decrypt(struct skcipher_request *req) > init_crypt(req); > return xor_tweak_pre(req) ?: > crypto_skcipher_decrypt(subreq) ?: > - xor_tweak_post(req); > + xor_tweak_post(req, false); > } > > static int init_tfm(struct crypto_skcipher *tfm) > -- This isn't correct because skcipher_walk_virt() can still sleep when atomic=true, since it may need to copy the IV. That's why I added might_sleep() to it in commit bb648291fc04, and that's what's triggering here. This is the correct fix: diff --git a/crypto/xts.c b/crypto/xts.c index aed11e63ca315..359ef15e1673e 100644 --- a/crypto/xts.c +++ b/crypto/xts.c @@ -101,6 +101,7 @@ static int xor_tweak(struct skcipher_request *req, bool second_pass) req = &rctx->subreq; /* set to our TFM to enforce correct alignment: */ skcipher_request_set_tfm(req, tfm); + req->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; } err = skcipher_walk_virt(&w, req, false); Likewise for LRW. With the crypto self-tests and CONFIG_DEBUG_ATOMIC_SLEEP enabled, this bug can be reproduced by trying to allocate the skcipher algorithm "xts(cryptd(ecb(aes-generic)))", e.g. by running in Python: import socket fd = socket.socket(socket.AF_ALG, 5, 0) fd.bind(("skcipher", "xts(cryptd(ecb(aes-generic)))")) Likewise, use "lrw(cryptd(ecb(aes-generic)))" for LRW. - Eric