Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3

David Wysochanski <dwysocha@xxxxxxxxxx> · Thu, 17 Oct 2019 15:23:10 -0400

On Thu, Oct 17, 2019 at 2:29 PM Pavel Shilovskiy <pshilov@xxxxxxxxxxxxx> wrote:
>
> Hi Ronnie, David,
>
> Thanks for looking into this. This actually reminds me of the commit 696e420bb2a66:
>
> --------------------------------------
> commit 696e420bb2a6624478105651d5368d45b502b324
> Author: Lars Persson <lars.persson@xxxxxxxx>
> Date:   Mon Jun 25 14:05:25 2018 +0200
>
>     cifs: Fix use after free of a mid_q_entry
>
>     With protocol version 2.0 mounts we have seen crashes with corrupt mid
>     entries. Either the server->pending_mid_q list becomes corrupt with a
>     cyclic reference in one element or a mid object fetched by the
>     demultiplexer thread becomes overwritten during use.
>
>     Code review identified a race between the demultiplexer thread and the
>     request issuing thread. The demultiplexer thread seems to be written
>     with the assumption that it is the sole user of the mid object until
>     it calls the mid callback which either wakes the issuer task or
>     deletes the mid.
>
>     This assumption is not true because the issuer task can be woken up
>     earlier by a signal. If the demultiplexer thread has proceeded as far
>     as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
>     thread will happily end up calling cifs_delete_mid while the
>     demultiplexer thread still is using the mid object.
>
>     Inserting a delay in the cifs demultiplexer thread widens the race
>     window and makes reproduction of the race very easy:
>
>                     if (server->large_buf)
>                             buf = server->bigbuf;
>
>     +               usleep_range(500, 4000);
>
>                     server->lstrp = jiffies;
>
>     To resolve this I think the proper solution involves putting a
>     reference count on the mid object. This patch makes sure that the
>     demultiplexer thread holds a reference until it has finished
>     processing the transaction.
>
>     Cc: stable@xxxxxxxxxxxxxxx
>     Signed-off-by: Lars Persson <larper@xxxxxxxx>
>     Acked-by: Paulo Alcantara <palcantara@xxxxxxx>
>     Reviewed-by: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
>     Reviewed-by: Pavel Shilovsky <pshilov@xxxxxxxxxxxxx>
>     Signed-off-by: Steve French <stfrench@xxxxxxxxxxxxx>
>
> --------------------------------------
>
> The similar solution of taking an extra reference should apply to the case of reconnect as well. The reference should be taken during the process of moving mid entries to the private list. Once a callback completes, such a reference should be put back thus freeing the mid.
>

Ah ok very good.  The above seems consistent with the traces I'm
seeing of the race.
I am going to test this patch as it sounds like what you're describing
and similar to what Ronnie suggested earlier:

--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,6 +564,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+               kref_get(&mid_entry->refcount);
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list);
@@ -576,6 +577,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
                list_del_init(&mid_entry->qhead);
                mid_entry->callback(mid_entry);
+               cifs_mid_q_entry_release(mid_entry);
        }

        if (cifs_rdma_enabled(server)) {






> -----Original Message-----
> From: David Wysochanski <dwysocha@xxxxxxxxxx>
> Sent: Thursday, October 17, 2019 8:30 AM
> To: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
> Cc: Pavel Shilovskiy <pshilov@xxxxxxxxxxxxx>; linux-cifs <linux-cifs@xxxxxxxxxxxxxxx>; Frank Sorenson <sorenson@xxxxxxxxxx>
> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3
>
> On Thu, Oct 17, 2019 at 10:08 AM Ronnie Sahlberg <lsahlber@xxxxxxxxxx> wrote:
> >
> > List, Pavel,
> >
> > So I think there are two bugs we need to fix in this small block to make it safe against a race against other threads calling cifs_delete_mid() and similar.
> >
> > We need to to protect the list mutate functions and wrap them inside
> > the GlobalMid_Lock mutex but we can not hold this lock across the callback call.
> >
> > But we still need to protect the mid_entry dereference and the
> > ->callback call against the mid structure being freed by
> > DeleteMidQEntry().  We can do that by taking out an extra reference to the mid while holding the GlobalMid_Lock and then dropping the reference again after the callback completes.
> >
> >
> > I think something like this might work :
> >
> >
> >
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index
> > bdea4b3e8005..3a1a9b63bd9b 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -572,11 +572,19 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >         mutex_unlock(&server->srv_mutex);
> >
> >         cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > +       spin_lock(&GlobalMid_Lock);
> >         list_for_each_safe(tmp, tmp2, &retry_list) {
> >                 mid_entry = list_entry(tmp, struct mid_q_entry,
> > qhead);
>
> I think you need a reference before this and something like if (mid->mid_flags ...)  /* check for someone else already deleting it */ ; else
> >                 list_del_init(&mid_entry->qhead);
>
> I am still tracing and I do not see the root of the problem yet.
> Unsurprisingly, it looks like a use after free though.
>
>
> > +               kref_get(&mid_entry->refcount);
> > +               spin_unlock(&GlobalMid_Lock);
> > +
> >                 mid_entry->callback(mid_entry);
> > +               cifs_mid_q_entry_release(mid_entry);
> > +
> > +               spin_lock(&GlobalMid_Lock);
> >         }
> > +       spin_unlock(&GlobalMid_Lock);
> >
> >         if (cifs_rdma_enabled(server)) {
> >                 mutex_lock(&server->srv_mutex);
> >
> >
> > Pavel, can you have a look at this and comment?  It is very delicate code so it needs careful review.
> >
> >
> > regards
> > ronnie sahlberg
> >
> >
> >
> > ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@xxxxxxxxxx>
> > To: "Ronnie Sahlberg" <lsahlber@xxxxxxxxxx>
> > Cc: "linux-cifs" <linux-cifs@xxxxxxxxxxxxxxx>, "Frank Sorenson"
> > <sorenson@xxxxxxxxxx>
> > Sent: Thursday, 17 October, 2019 9:42:08 PM
> > Subject: Re: list_del corruption while iterating retry_list in
> > cifs_reconnect still seen on 5.4-rc3
> >
> > On Thu, Oct 17, 2019 at 5:05 AM Ronnie Sahlberg <lsahlber@xxxxxxxxxx> wrote:
> > >
> > >
> > >
> > > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > > 577         list_del_init(&mid_entry->qhead);
> > > > > 578         mid_entry->callback(mid_entry);
> > > > > 579     }
> > >
> > > This part (and a similar loop during shutting down the demultiplex
> > > thread) is the only place where we add/remove to the ->qhead list without holding the GlobalMid_Lock.
> > >
> > > I wonder if it is racing against a different thread also modifying
> > > qhead for the same mid, like cifs_delete_mid() for example.
> > >
> >
> > Yes I agree, I was thinking along these same lines of reasoning as I
> > read the code.  I put the latest on the investigation into the bug:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> > illa.redhat.com%2Fshow_bug.cgi%3Fid%3D1654538%23c15&amp;data=02%7C01%7
> > Cpshilov%40microsoft.com%7C3f7f3981d44f4cac5afc08d75316efe8%7C72f988bf
> > 86f141af91ab2d7cd011db47%7C1%7C0%7C637069230291690406&amp;sdata=0P%2Bj
> > YJWfBSHXGUNPZEeZR4W9tOb2%2BGCC1WtDmiyzpEI%3D&amp;reserved=0
> >
> > Just before the crash, when we hit the iteration of retry_list
> > something has gone wrong - the very first mid_entry on retry_list is a
> > garbage address, when normally it should be the last address from the
> > previous call to list_move in the loop above it.
> >
> >
> >  ----- Original Message -----
> > > > From: "Ronnie Sahlberg" <lsahlber@xxxxxxxxxx>
> > > > To: "David Wysochanski" <dwysocha@xxxxxxxxxx>
> > > > Cc: "linux-cifs" <linux-cifs@xxxxxxxxxxxxxxx>, "Frank Sorenson"
> > > > <sorenson@xxxxxxxxxx>
> > > > Sent: Thursday, 17 October, 2019 10:17:18 AM
> > > > Subject: Re: list_del corruption while iterating retry_list in
> > > > cifs_reconnect still seen on 5.4-rc3
> > > >
> > > > I can not reproduce this :-(
> > > >
> > > > I have run it for a few hours, restarting samba in a loop with up
> > > > to 30 threads.
> > > >
> >
> > I am not sure if it helps but I have 8 CPUs on my VM.
> > I also have server signing and client signing mandatory.
> > I can send you the smb.conf offline.
> >
> >
> > > >
> > > > Can you check
> > > > 1, If this only reproduce for you for the root of the share or it
> > > > also reproduces for a subdirectory?
> > > > 2, Does it reproduce also if you use "nohandlecache" mount option?
> > > >    This disables the use of cached open of the root handle, i.e.
> > > >    open_shroot()
> > > > 3, When this happens, can you check the content of the mid entry
> > > > and what these fields are:
> > > >    mid->mid_flags, mid->handle (this is a function pointer, what does it
> > > >    point to)
> > > >    mid->command.   Maybe print the whole structure.
> > > >
> >
> > Ok I'll see what I can find out.  So far I am not sure I have
> > identified what else is touching the mid in between the two loops.
> >
> > > > regards
> > > > ronnie sahlberg
> > > >
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "David Wysochanski" <dwysocha@xxxxxxxxxx>
> > > > > To: "linux-cifs" <linux-cifs@xxxxxxxxxxxxxxx>
> > > > > Cc: "Frank Sorenson" <sorenson@xxxxxxxxxx>
> > > > > Sent: Thursday, 17 October, 2019 5:27:02 AM
> > > > > Subject: list_del corruption while iterating retry_list in
> > > > > cifs_reconnect still seen on 5.4-rc3
> > > > >
> > > > > I think this has been there for a long time, since we first saw
> > > > > this on a 4.18.0 based kernel but I just noticed the bug recently.
> > > > > I just retested on 5.4-rc3 and it's still there.  Easy to repro
> > > > > with a fairly simple but invasive server restart test - takes
> > > > > only maybe a couple minutes on my VM.
> > > > >
> > > > >
> > > > > From Frank Sorenson:
> > > > >
> > > > > mount off a samba server:
> > > > >
> > > > >     # mount //vm1/share /mnt/vm1
> > > > > -overs=2.1,hard,sec=ntlmssp,credentials=/root/.smb_creds
> > > > >
> > > > >
> > > > > on the client, start 10 'find' loops:
> > > > >
> > > > >     # export test_path=/mnt/vm1
> > > > >     # do_find() { while true ; do find $test_path >/dev/null
> > > > > 2>&1 ; done }
> > > > >
> > > > >     # for i in {1..10} ; do do_find & done
> > > > >
> > > > >
> > > > > optional:  also start something to monitor for when the hang occurs:
> > > > >
> > > > >     # while true ; do count=$(grep smb2_reconnect /proc/*/stack
> > > > > -A3 | grep -c open_shroot) ; [[ $count -gt 0 ]] && { echo "$(date):
> > > > > reproduced bug" ; break ; } ; echo "$(date): stayin' alive" ;
> > > > > sleep 2 ; done
> > > > >
> > > > >
> > > > >
> > > > > On the samba server:  restart smb.service (loop it in case it
> > > > > requires more than one restart):
> > > > >
> > > > >     # while true ; do echo "$(date): restarting" ; systemctl
> > > > > restart smb.service ; sleep 5 ; done | tee
> > > > > /var/tmp/smb_restart_log.out
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > [  430.454897] list_del corruption. prev->next should be
> > > > > ffff98d3a8f316c0, but was 2e885cb266355469 [  430.464668]
> > > > > ------------[ cut here ]------------ [  430.466569] kernel BUG
> > > > > at lib/list_debug.c:51!
> > > > > [  430.468476] invalid opcode: 0000 [#1] SMP PTI [  430.470286]
> > > > > CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted
> > > > > 5.4.0-rc3+ #19 [  430.473472] Hardware name: Red Hat KVM, BIOS
> > > > > 0.5.1 01/01/2011 [  430.475872] RIP:
> > > > > 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > > [  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78
> > > > > 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f
> > > > > 15 8e e8 32
> > > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3
> > > > > c5 ff 0f 0b [  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS:
> > > > > 00010246 [  430.487665] RAX: 0000000000000054 RBX:
> > > > > ffff98d3aabb8800 RCX:
> > > > > 0000000000000000
> > > > > [  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > > ffff98d3b7a17908
> > > > > [  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > > 0000000000000285
> > > > > [  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > > ffff98d3aabb89c0
> > > > > [  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > > ffff98d3b24c4480
> > > > > [  430.501981] FS:  0000000000000000(0000)
> > > > > GS:ffff98d3b7a00000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > > 00000000000406f0
> > > > > [  430.510426] Call Trace:
> > > > > [  430.511500]  cifs_reconnect+0x25e/0x610 [cifs] [  430.513350]
> > > > > cifs_readv_from_socket+0x220/0x250 [cifs] [  430.515464]
> > > > > cifs_read_from_socket+0x4a/0x70 [cifs] [  430.517452]  ?
> > > > > try_to_wake_up+0x212/0x650 [  430.519122]  ?
> > > > > cifs_small_buf_get+0x16/0x30 [cifs] [  430.521086]  ?
> > > > > allocate_buffers+0x66/0x120 [cifs] [  430.523019]
> > > > > cifs_demultiplex_thread+0xdc/0xc30 [cifs] [  430.525116]
> > > > > kthread+0xfb/0x130 [  430.526421]  ?
> > > > > cifs_handle_standard+0x190/0x190 [cifs] [  430.528514]  ?
> > > > > kthread_park+0x90/0x90 [  430.530019]  ret_from_fork+0x35/0x40 [
> > > > > 430.531487] Modules linked in: cifs libdes libarc4 ip6t_rpfilter
> > > > > ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
> > > > > ip6table_mangle ip6table_raw ip6table_security iptable_nat
> > > > > nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> > > > > nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter
> > > > > ebtables ip6table_filter ip6_tables crct10dif_pclmul
> > > > > crc32_pclmul joydev virtio_balloon ghash_clmulni_intel i2c_piix4
> > > > > nfsd nfs_acl lockd auth_rpcgss grace sunrpc xfs libcrc32c
> > > > > virtio_net net_failover crc32c_intel virtio_console serio_raw
> > > > > virtio_blk ata_generic failover pata_acpi qemu_fw_cfg [
> > > > > 430.552782] ---[ end trace c91d4468f8689482 ]--- [  430.554948]
> > > > > RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
> > > > > [  430.557251] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78
> > > > > 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f
> > > > > 15 8e e8 32
> > > > > a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3
> > > > > c5 ff 0f 0b [  430.565019] RSP: 0018:ffffb4db0042fd38 EFLAGS:
> > > > > 00010246 [  430.567181] RAX: 0000000000000054 RBX:
> > > > > ffff98d3aabb8800 RCX:
> > > > > 0000000000000000
> > > > > [  430.570073] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI:
> > > > > ffff98d3b7a17908
> > > > > [  430.572955] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09:
> > > > > 0000000000000285
> > > > > [  430.575854] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12:
> > > > > ffff98d3aabb89c0
> > > > > [  430.578745] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15:
> > > > > ffff98d3b24c4480
> > > > > [  430.581624] FS:  0000000000000000(0000)
> > > > > GS:ffff98d3b7a00000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  430.584881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  430.587230] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4:
> > > > > 00000000000406f0
> > > > >
> > > > >
> > > > > crash> dis -lr cifs_reconnect+0x25e | tail --lines=20
> > > > > 0xffffffffc062dc26 <cifs_reconnect+0x226>:      movb
> > > > > $0x0,0xbb36b(%rip)        # 0xffffffffc06e8f98 <GlobalMid_Lock>
> > > > > /mnt/build/kernel/fs/cifs/connect.c: 572
> > > > > 0xffffffffc062dc2d <cifs_reconnect+0x22d>:      mov    %r12,%rdi
> > > > > 0xffffffffc062dc30 <cifs_reconnect+0x230>:      callq
> > > > > 0xffffffff8d9d5a20 <mutex_unlock>
> > > > > /mnt/build/kernel/fs/cifs/connect.c: 574
> > > > > 0xffffffffc062dc35 <cifs_reconnect+0x235>:      testb
> > > > > $0x1,0xbb300(%rip)        # 0xffffffffc06e8f3c <cifsFYI>
> > > > > 0xffffffffc062dc3c <cifs_reconnect+0x23c>:      je
> > > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>
> > > > > /mnt/build/kernel/./arch/x86/include/asm/jump_label.h: 25
> > > > > 0xffffffffc062dc3e <cifs_reconnect+0x23e>:      data32 data32 data32
> > > > > xchg %ax,%ax
> > > > > /mnt/build/kernel/fs/cifs/connect.c: 575
> > > > > 0xffffffffc062dc43 <cifs_reconnect+0x243>:      mov    0x8(%rsp),%rbp
> > > > > 0xffffffffc062dc48 <cifs_reconnect+0x248>:      mov    0x0(%rbp),%r14
> > > > > 0xffffffffc062dc4c <cifs_reconnect+0x24c>:      cmp    %r13,%rbp
> > > > > 0xffffffffc062dc4f <cifs_reconnect+0x24f>:      jne
> > > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>
> > > > > 0xffffffffc062dc51 <cifs_reconnect+0x251>:      jmp
> > > > > 0xffffffffc062dc90 <cifs_reconnect+0x290>
> > > > > 0xffffffffc062dc53 <cifs_reconnect+0x253>:      mov    %rax,%r14
> > > > > /mnt/build/kernel/./include/linux/list.h: 190
> > > > > 0xffffffffc062dc56 <cifs_reconnect+0x256>:      mov    %rbp,%rdi
> > > > > 0xffffffffc062dc59 <cifs_reconnect+0x259>:      callq
> > > > > 0xffffffff8d4e6b00 <__list_del_entry_valid>
> > > > > 0xffffffffc062dc5e <cifs_reconnect+0x25e>:      test   %al,%al
> > > > >
> > > > >
> > > > > fs/cifs/connect.c
> > > > > 566         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > > 567         if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
> > > > > 568             mid_entry->mid_state = MID_RETRY_NEEDED;
> > > > > 569         list_move(&mid_entry->qhead, &retry_list);
> > > > > 570     }
> > > > > 571     spin_unlock(&GlobalMid_Lock);
> > > > > 572     mutex_unlock(&server->srv_mutex);
> > > > > 573
> > > > > 574     cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> > > > > 575-->    list_for_each_safe(tmp, tmp2, &retry_list) {
> > > > > 576         mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> > > > > 577         list_del_init(&mid_entry->qhead);
> > > > > 578         mid_entry->callback(mid_entry);
> > > > > 579     }
> > > > > 580
> > > > > 581     if (cifs_rdma_enabled(server)) {
> > > > > 582         mutex_lock(&server->srv_mutex);
> > > > > 583         smbd_destroy(server);
> > > > > 584         mutex_unlock(&server->srv_mutex);
> > > > >
> > > >
> >
> >
> >
> > --
> > Dave Wysochanski
> > Principal Software Maintenance Engineer
> > T: 919-754-4024
>
>
>
> --
> Dave Wysochanski
> Principal Software Maintenance Engineer
> T: 919-754-4024