On Fri, Jul 27, 2018 at 9:33 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Thu, Jul 26, 2018 at 5:15 PM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >> >> On Thu, Jul 26, 2018 at 9:49 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> > On Thu, Jul 26, 2018 at 1:07 AM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >> >> >> >> On Wed, Jul 25, 2018 at 6:07 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >> >> > On Wed, Jul 25, 2018 at 5:51 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: >> >> >> >> >> >> >> >> >> On Wed, Jul 25, 2018 at 5:41 PM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> >> >> >> wrote: >> >> >>> >> >> >>> I am not sure this related to RBD, but in case it is, this would be an >> >> >>> important bug to fix. >> >> >>> >> >> >>> Running LVM on top of RBD, XFS filesystem on top of that, consumed in RHEL >> >> >>> 7.4. >> >> >>> >> >> >>> When running a large read operation and doing LVM snapshots during >> >> >>> that operation, the block being read winds up all zeroes in pagecache. >> >> >>> >> >> >>> Dropping the caches syncs up the block with what's on "disk" and >> >> >>> everything is fine. >> >> >>> >> >> >>> Working on steps to reproduce simply - ceph is Luminous 12.2.7, RHEL >> >> >>> client is Jewel 10.2.10-17.el7cp >> >> >> >> >> >> >> >> >> Is this krbd or QEMU+librbd? If the former, what kernel version are you >> >> >> running? >> >> > >> >> > It's krbd on RHEL. >> >> > >> >> > RHEL kernel: >> >> > >> >> > Linux dmg-cbcache01 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 >> >> > EDT 2018 x86_64 x86_64 x86_64 GNU/Linux >> >> >> >> Not sure if this is exactly replicating the issue, but I was able to >> >> do this on two different systems: >> >> >> >> RHEL 7.4 kernel as above. >> > >> > 3.10.0-862.el7 is a RHEL 7.5 kernel. >> > I think you have a 7.4 system with a 7.5 kernel. >> >> I was wrong, it's 7.5: >> >> Red Hat Enterprise Linux Server release 7.5 (Maipo) >> >> > >> >> >> >> Create a PVM PV on a mapped kRBD device >> >> >> >> example: pvcreate /dev/rbd/spin1/lvm1 >> >> >> >> Create a VG and LV, make an XFS FS >> >> >> >> vgcreate datavg /dev/rbd/spin1/lvm1 >> >> lvcreate -n data1 -L 5G datavg >> >> mkfs.xfs /dev/datavg/data1 >> >> <mount on something> >> >> >> >> Get some large file and copy it to some other file, same storage or >> >> different. All is well. >> > >> > Same storage as in the same XFS filesystem? Is copying it from >> > /mnt/foo to /mnt/bar enough (assuming /dev/datavg/data1 is on /mnt)? >> >> I tried both same and different. Yes, copy on the same fs appears to >> be enough to trigger this. >> >> > >> >> >> >> Now snapshot the LV >> >> >> >> lvcreate -l8%ORIGIN -s -n snap_data1 /dev/datavg/data1 --addtag backup >> > >> > 8% of 5G is 400M. How large is the file? >> >> This is just an example, I tried it with a 94 GB filesystem and a 5GB >> file, and also with a 700gb VG and a 70 GB filesystem and a 10 GB file >> >> > >> >> >> >> Now try to copy that file again. I get: >> >> >> >> NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:3470] >> >> >> >> And in dmesg (this is on Proxmox but I did the same on ESXi) >> >> >> >> [1397609.308673] sched: RT throttling activated >> >> [1397658.759259] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! >> >> [kworker/0:1:2648] >> >> [1397658.759354] Modules linked in: dm_snapshot dm_bufio rbd libceph >> >> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache >> >> sunrpc ppdev joydev pcspkr sg parport_pc virtio_balloon parport shpchp >> >> i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod crc_t10dif >> >> crct10dif_generic cdrom crct10dif_common ata_generic pata_acpi >> >> virtio_scsi virtio_console virtio_net bochs_drm drm_kms_helper >> >> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata >> >> serio_raw virtio_pci i2c_core virtio_ring virtio floppy dm_mirror >> >> dm_region_hash dm_log dm_mod >> >> [1397658.759400] CPU: 0 PID: 2648 Comm: kworker/0:1 Kdump: loaded Not >> >> tainted 3.10.0-862.el7.x86_64 #1 >> >> [1397658.759402] Hardware name: QEMU Standard PC (i440FX + PIIX, >> >> 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org >> >> 04/01/2014 >> >> [1397658.759415] Workqueue: kcopyd do_work [dm_mod] >> >> [1397658.759418] task: ffff932df65d3f40 ti: ffff932fb138c000 task.ti: >> >> ffff932fb138c000 >> >> [1397658.759420] RIP: 0010:[<ffffffffc08535a0>] [<ffffffffc08535a0>] >> >> copy_callback+0x50/0x130 [dm_snapshot] >> >> [1397658.759426] RSP: 0018:ffff932fb138fd08 EFLAGS: 00000283 >> >> [1397658.759428] RAX: 000000000003e5e8 RBX: ffffebecc4943ec0 RCX: >> >> ffff932ff4704068 >> >> [1397658.759430] RDX: ffff932dc8050d00 RSI: ffff932fd6a0f9b8 RDI: >> >> 0000000000000000 >> >> [1397658.759431] RBP: ffff932fb138fd28 R08: ffff932dc7d2c0b0 R09: >> >> ffff932dc8050d20 >> >> [1397658.759433] R10: 00000000c7d2b301 R11: ffffebecc01f4a00 R12: >> >> 0000000000000000 >> >> [1397658.759435] R13: 0000000180090003 R14: 0000000000000000 R15: >> >> 00000000ffffff80 >> >> [1397658.759438] FS: 0000000000000000(0000) GS:ffff932fffc00000(0000) >> >> knlGS:0000000000000000 >> >> [1397658.759440] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> >> [1397658.759442] CR2: 00007f17bcd5e860 CR3: 0000000042c0e000 CR4: >> >> 00000000000006f0 >> >> [1397658.759447] Call Trace: >> >> [1397658.759452] [<ffffffffc0853550>] ? origin_resume+0x70/0x70 [dm_snapshot] >> >> [1397658.759459] [<ffffffffc0340eab>] run_complete_job+0x6b/0xc0 [dm_mod] >> >> [1397658.759466] [<ffffffffc0340f60>] process_jobs+0x60/0x100 [dm_mod] >> >> [1397658.759471] [<ffffffffc0340e40>] ? kcopyd_put_pages+0x50/0x50 [dm_mod] >> >> [1397658.759477] [<ffffffffc0341042>] do_work+0x42/0x90 [dm_mod] >> >> [1397658.759483] [<ffffffffa08b2dff>] process_one_work+0x17f/0x440 >> >> [1397658.759485] [<ffffffffa08b3bcc>] worker_thread+0x22c/0x3c0 >> >> [1397658.759489] [<ffffffffa08b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0 >> >> [1397658.759494] [<ffffffffa08bae31>] kthread+0xd1/0xe0 >> >> [1397658.759497] [<ffffffffa08bad60>] ? insert_kthread_work+0x40/0x40 >> >> [1397658.759503] [<ffffffffa0f1f637>] ret_from_fork_nospec_begin+0x21/0x21 >> >> [1397658.759506] [<ffffffffa08bad60>] ? insert_kthread_work+0x40/0x40 >> > >> > Did you experiment with the snapshot chunk size (lvcreate --chunksize)? >> > I wonder if the default shapshot chunk size is the same on RHEL 7.4 and >> > Ubuntu you tried 4.14.39 on. >> >> I did not, but this shows 0: >> >> DR-[root@dmg-cbcache01 issdr]$ lvs -o+chunksize datavg/lvdb2 >> LV VG Attr LSize Pool Origin Data% Meta% Move Log >> Cpy%Sync Convert Chunk >> lvdb2 datavg -wi-ao---- 70.00g >> 0 >> lvdb2 datavg -wi-ao---- 70.00g >> 0 >> >> root@lab2-b2:~# lvs -o+chunksize cerh1_cache_clone/data >> LV VG Attr LSize Pool Origin Data% Meta% >> Move Log Cpy%Sync Convert Chunk >> data cerh1_cache_clone -wi------- 90.00g >> 0 > > Chunk size is only relevant for snapshots, cache devices, etc. > Run that lvs command on a snapshot, like datavg/snap_data1. OK - here it is on RHEL and Ubuntu, seems the same LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Chunk snap_lvspool journalvg swi-a-s--- 128.00m lvspool 0.01 4.00k LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Chunk snap_data cerh1_cache_clone swi---s--- 7.55g data 4.00k > > Thanks, > > Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com