Re: LVM on top of RBD apparent pagecache corruption with snapshots

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 26, 2018 at 9:21 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Thu, Jul 26, 2018 at 1:55 AM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Wed, Jul 25, 2018 at 7:07 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>> > On Wed, Jul 25, 2018 at 6:07 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>> >> On Wed, Jul 25, 2018 at 5:51 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>> >>>
>> >>>
>> >>> On Wed, Jul 25, 2018 at 5:41 PM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>
>> >>> wrote:
>> >>>>
>> >>>> I am not sure this related to RBD, but in case it is, this would be an
>> >>>> important bug to fix.
>> >>>>
>> >>>> Running LVM on top of RBD, XFS filesystem on top of that, consumed in RHEL
>> >>>> 7.4.
>> >>>>
>> >>>> When running a large read operation and doing LVM snapshots during
>> >>>> that operation, the block being read winds up all zeroes in pagecache.
>> >>>>
>> >>>> Dropping the caches syncs up the block with what's on "disk" and
>> >>>> everything is fine.
>> >>>>
>> >>>> Working on steps to reproduce simply - ceph is Luminous 12.2.7, RHEL
>> >>>> client is Jewel 10.2.10-17.el7cp
>> >>>
>> >>>
>> >>> Is this krbd or QEMU+librbd? If the former, what kernel version are you
>> >>> running?
>> >>
>> >> It's krbd on RHEL.
>> >>
>> >> RHEL kernel:
>> >>
>> >> Linux dmg-cbcache01 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51
>> >> EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > Not sure if this is exactly replicating the issue, but I was able to
>> > do this on two different systems:
>> >
>> > RHEL 7.4 kernel as above.
>> >
>> > Create a PVM PV on a mapped kRBD device
>> >
>> > example: pvcreate /dev/rbd/spin1/lvm1
>> >
>> > Create a VG and LV, make an XFS FS
>> >
>> > vgcreate datavg /dev/rbd/spin1/lvm1
>> > lvcreate -n data1 -L 5G datavg
>> > mkfs.xfs /dev/datavg/data1
>> > <mount on something>
>> >
>> > Get some large file and copy it to some other file, same storage or
>> > different.  All is well.
>> >
>> > Now snapshot the LV
>> >
>> > lvcreate -l8%ORIGIN -s -n snap_data1 /dev/datavg/data1 --addtag backup
>> >
>> > Now try to copy that file again.  I get:
>> >
>> > NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:3470]
>> >
>> > And in dmesg (this is on Proxmox but I did the same on ESXi)
>> >
>> > [1397609.308673] sched: RT throttling activated
>> > [1397658.759259] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>> > [kworker/0:1:2648]
>> > [1397658.759354] Modules linked in: dm_snapshot dm_bufio rbd libceph
>> > rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
>> > sunrpc ppdev joydev pcspkr sg parport_pc virtio_balloon parport shpchp
>> > i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod crc_t10dif
>> > crct10dif_generic cdrom crct10dif_common ata_generic pata_acpi
>> > virtio_scsi virtio_console virtio_net bochs_drm drm_kms_helper
>> > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata
>> > serio_raw virtio_pci i2c_core virtio_ring virtio floppy dm_mirror
>> > dm_region_hash dm_log dm_mod
>> > [1397658.759400] CPU: 0 PID: 2648 Comm: kworker/0:1 Kdump: loaded Not
>> > tainted 3.10.0-862.el7.x86_64 #1
>> > [1397658.759402] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> > 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org
>> > 04/01/2014
>> > [1397658.759415] Workqueue: kcopyd do_work [dm_mod]
>> > [1397658.759418] task: ffff932df65d3f40 ti: ffff932fb138c000 task.ti:
>> > ffff932fb138c000
>> > [1397658.759420] RIP: 0010:[<ffffffffc08535a0>]  [<ffffffffc08535a0>]
>> > copy_callback+0x50/0x130 [dm_snapshot]
>> > [1397658.759426] RSP: 0018:ffff932fb138fd08  EFLAGS: 00000283
>> > [1397658.759428] RAX: 000000000003e5e8 RBX: ffffebecc4943ec0 RCX:
>> > ffff932ff4704068
>> > [1397658.759430] RDX: ffff932dc8050d00 RSI: ffff932fd6a0f9b8 RDI:
>> > 0000000000000000
>> > [1397658.759431] RBP: ffff932fb138fd28 R08: ffff932dc7d2c0b0 R09:
>> > ffff932dc8050d20
>> > [1397658.759433] R10: 00000000c7d2b301 R11: ffffebecc01f4a00 R12:
>> > 0000000000000000
>> > [1397658.759435] R13: 0000000180090003 R14: 0000000000000000 R15:
>> > 00000000ffffff80
>> > [1397658.759438] FS:  0000000000000000(0000) GS:ffff932fffc00000(0000)
>> > knlGS:0000000000000000
>> > [1397658.759440] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> > [1397658.759442] CR2: 00007f17bcd5e860 CR3: 0000000042c0e000 CR4:
>> > 00000000000006f0
>> > [1397658.759447] Call Trace:
>> > [1397658.759452]  [<ffffffffc0853550>] ? origin_resume+0x70/0x70 [dm_snapshot]
>> > [1397658.759459]  [<ffffffffc0340eab>] run_complete_job+0x6b/0xc0 [dm_mod]
>> > [1397658.759466]  [<ffffffffc0340f60>] process_jobs+0x60/0x100 [dm_mod]
>> > [1397658.759471]  [<ffffffffc0340e40>] ? kcopyd_put_pages+0x50/0x50 [dm_mod]
>> > [1397658.759477]  [<ffffffffc0341042>] do_work+0x42/0x90 [dm_mod]
>> > [1397658.759483]  [<ffffffffa08b2dff>] process_one_work+0x17f/0x440
>> > [1397658.759485]  [<ffffffffa08b3bcc>] worker_thread+0x22c/0x3c0
>> > [1397658.759489]  [<ffffffffa08b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
>> > [1397658.759494]  [<ffffffffa08bae31>] kthread+0xd1/0xe0
>> > [1397658.759497]  [<ffffffffa08bad60>] ? insert_kthread_work+0x40/0x40
>> > [1397658.759503]  [<ffffffffa0f1f637>] ret_from_fork_nospec_begin+0x21/0x21
>> > [1397658.759506]  [<ffffffffa08bad60>] ? insert_kthread_work+0x40/0x40
>> >
>> >
>>
>>
>> Tried same on Ubuntu kernel 4.14.39 - no issues
>
> What about the original "zeroes in the page cache" issue?  Do you
> see it with 4.14.39?  Trying to establish whether the soft lockup is
> related.
>
> Thanks,
>
>                 Ilya

The zeroes in the pagecache result from a fairly complex Cache
database workload, which for me is hard to replicate (although the
database people replicate it every time I run LVM snapshots).  So I
wanted to run something simpler, like a large file copy.  The db
workload does not result in lockups, but this may be because they
don't retrieve as many records.

It is definitely on read when this issue happens.

--
Alex Gorbachev
Storcium
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux