Re: LVM on top of RBD apparent pagecache corruption with snapshots

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 26, 2018 at 9:49 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Thu, Jul 26, 2018 at 1:07 AM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Wed, Jul 25, 2018 at 6:07 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>> > On Wed, Jul 25, 2018 at 5:51 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>> >>
>> >>
>> >> On Wed, Jul 25, 2018 at 5:41 PM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> I am not sure this related to RBD, but in case it is, this would be an
>> >>> important bug to fix.
>> >>>
>> >>> Running LVM on top of RBD, XFS filesystem on top of that, consumed in RHEL
>> >>> 7.4.
>> >>>
>> >>> When running a large read operation and doing LVM snapshots during
>> >>> that operation, the block being read winds up all zeroes in pagecache.
>> >>>
>> >>> Dropping the caches syncs up the block with what's on "disk" and
>> >>> everything is fine.
>> >>>
>> >>> Working on steps to reproduce simply - ceph is Luminous 12.2.7, RHEL
>> >>> client is Jewel 10.2.10-17.el7cp
>> >>
>> >>
>> >> Is this krbd or QEMU+librbd? If the former, what kernel version are you
>> >> running?
>> >
>> > It's krbd on RHEL.
>> >
>> > RHEL kernel:
>> >
>> > Linux dmg-cbcache01 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51
>> > EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Not sure if this is exactly replicating the issue, but I was able to
>> do this on two different systems:
>>
>> RHEL 7.4 kernel as above.
>
> 3.10.0-862.el7 is a RHEL 7.5 kernel.
> I think you have a 7.4 system with a 7.5 kernel.
>
>>
>> Create a PVM PV on a mapped kRBD device
>>
>> example: pvcreate /dev/rbd/spin1/lvm1
>>
>> Create a VG and LV, make an XFS FS
>>
>> vgcreate datavg /dev/rbd/spin1/lvm1
>> lvcreate -n data1 -L 5G datavg
>> mkfs.xfs /dev/datavg/data1
>> <mount on something>
>>
>> Get some large file and copy it to some other file, same storage or
>> different.  All is well.
>
> Same storage as in the same XFS filesystem?  Is copying it from
> /mnt/foo to /mnt/bar enough (assuming /dev/datavg/data1 is on /mnt)?
>
>>
>> Now snapshot the LV
>>
>> lvcreate -l8%ORIGIN -s -n snap_data1 /dev/datavg/data1 --addtag backup
>
> 8% of 5G is 400M.  How large is the file?
>
>>
>> Now try to copy that file again.  I get:
>>
>> NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:3470]
>>
>> And in dmesg (this is on Proxmox but I did the same on ESXi)
>>
>> [1397609.308673] sched: RT throttling activated
>> [1397658.759259] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>> [kworker/0:1:2648]
>> [1397658.759354] Modules linked in: dm_snapshot dm_bufio rbd libceph
>> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
>> sunrpc ppdev joydev pcspkr sg parport_pc virtio_balloon parport shpchp
>> i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod crc_t10dif
>> crct10dif_generic cdrom crct10dif_common ata_generic pata_acpi
>> virtio_scsi virtio_console virtio_net bochs_drm drm_kms_helper
>> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata
>> serio_raw virtio_pci i2c_core virtio_ring virtio floppy dm_mirror
>> dm_region_hash dm_log dm_mod
>> [1397658.759400] CPU: 0 PID: 2648 Comm: kworker/0:1 Kdump: loaded Not
>> tainted 3.10.0-862.el7.x86_64 #1
>> [1397658.759402] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org
>> 04/01/2014
>> [1397658.759415] Workqueue: kcopyd do_work [dm_mod]
>> [1397658.759418] task: ffff932df65d3f40 ti: ffff932fb138c000 task.ti:
>> ffff932fb138c000
>> [1397658.759420] RIP: 0010:[<ffffffffc08535a0>]  [<ffffffffc08535a0>]
>> copy_callback+0x50/0x130 [dm_snapshot]
>> [1397658.759426] RSP: 0018:ffff932fb138fd08  EFLAGS: 00000283
>> [1397658.759428] RAX: 000000000003e5e8 RBX: ffffebecc4943ec0 RCX:
>> ffff932ff4704068
>> [1397658.759430] RDX: ffff932dc8050d00 RSI: ffff932fd6a0f9b8 RDI:
>> 0000000000000000
>> [1397658.759431] RBP: ffff932fb138fd28 R08: ffff932dc7d2c0b0 R09:
>> ffff932dc8050d20
>> [1397658.759433] R10: 00000000c7d2b301 R11: ffffebecc01f4a00 R12:
>> 0000000000000000
>> [1397658.759435] R13: 0000000180090003 R14: 0000000000000000 R15:
>> 00000000ffffff80
>> [1397658.759438] FS:  0000000000000000(0000) GS:ffff932fffc00000(0000)
>> knlGS:0000000000000000
>> [1397658.759440] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [1397658.759442] CR2: 00007f17bcd5e860 CR3: 0000000042c0e000 CR4:
>> 00000000000006f0
>> [1397658.759447] Call Trace:
>> [1397658.759452]  [<ffffffffc0853550>] ? origin_resume+0x70/0x70 [dm_snapshot]
>> [1397658.759459]  [<ffffffffc0340eab>] run_complete_job+0x6b/0xc0 [dm_mod]
>> [1397658.759466]  [<ffffffffc0340f60>] process_jobs+0x60/0x100 [dm_mod]
>> [1397658.759471]  [<ffffffffc0340e40>] ? kcopyd_put_pages+0x50/0x50 [dm_mod]
>> [1397658.759477]  [<ffffffffc0341042>] do_work+0x42/0x90 [dm_mod]
>> [1397658.759483]  [<ffffffffa08b2dff>] process_one_work+0x17f/0x440
>> [1397658.759485]  [<ffffffffa08b3bcc>] worker_thread+0x22c/0x3c0
>> [1397658.759489]  [<ffffffffa08b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
>> [1397658.759494]  [<ffffffffa08bae31>] kthread+0xd1/0xe0
>> [1397658.759497]  [<ffffffffa08bad60>] ? insert_kthread_work+0x40/0x40
>> [1397658.759503]  [<ffffffffa0f1f637>] ret_from_fork_nospec_begin+0x21/0x21
>> [1397658.759506]  [<ffffffffa08bad60>] ? insert_kthread_work+0x40/0x40
>
> Did you experiment with the snapshot chunk size (lvcreate --chunksize)?
> I wonder if the default shapshot chunk size is the same on RHEL 7.4 and
> Ubuntu you tried 4.14.39 on.
>
> Thanks,

I also asked the database people what type of IO is being done when
they got the zeroes in page cache vs. good data on "disk:, here is the
response:

Normal user process will open a database file using
O_RDONLY, and will read data from it with pread in 8192 byte sized requests

Thank you,
Alex
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux