[ Richard, can you please fix your quoting and line wrapping to work like everyone else's mail clients?] On Sun, Sep 03, 2017 at 12:43:57AM +0000, Richard Wareing wrote: > On 9/2/17, 4:55 AM, "Brian Foster" <bfoster@xxxxxxxxxx> wrote: > > I am obviously not at all familiar with your storage stack and > > the requirements of your environment and whatnoat. It's > > certainly possible that there's some technical reason you > > can't use dm, but I find it very hard to believe that reason > > is "there might be bugs" if you're instead willing to hack up > > and deploy a barely tested feature such as XFS RT. Using dm > > for basic linear mapping (i.e., partitioning) seems pretty > > much ubiquitous in the Linux world these days. > > Bugs aren’t the only reason of course, but we’ve been > working on this for a number of months, we also have thousands of > production hours (* >10 FSes per system == >1M hours on the > real-time code) on this setup, I’m also doing more testing > with dm-flaky + dm-log w/ xfs-tests along with this. In any > event, large deviations (or starting over from scratch) on our > setup isn’t something we’d like to do. At this point I > trust the RT allocator a good amount, and its sheer simplicity is > something of an asset for us. I'm just going to address the "rt dev is stable and well tested" claim here. I have my doubts you're actually testing what you think you are testing with xfstests. Just configuring a rtdev doesn't mean xfstests runs all it's tests on the rtdev. All it means is it runs the very few tests that require a rtdev in addition to all the other tests it runs against the normal data device. If you really want to test rtdev functionality, you need to use the "-d rtinherit" mkfs option to force all file data to be targetted at the rtdev, not the data dev. And when you do that, the rtdev blows up in 3 different ways in under 30s, the thrid being a fatal kernel OOPS.... i.e.: Test device setup: $ mkfs.xfs -f -r rtdev=/dev/ram0 -d rtinherit=1 /dev/pmem0 xfstests config section: [xfs_rt] FSTYP=xfs TEST_DIR=/mnt/test TEST_DEV=/dev/pmem0 TEST_RTDEV=/dev/ram0 SCRATCH_MNT=/mnt/scratch SCRATCH_DEV=/dev/pmem1 SCRATCH_RTDEV=/dev/ram1 MKFS_OPTIONS="-d rtinherit=1" And the result of running: # ./check -g quick -s xfs_rt SECTION -- xfs_rt FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 test4 4.13.0-rc7-dgc MKFS_OPTIONS -- -f -d rtinherit=1 /dev/pmem1 MOUNT_OPTIONS -- /dev/pmem1 /mnt/scratch generic/001 3s ... 3s generic/002 0s ... 1s generic/003 10s ... - output mismatch (see /home/dave/src/xfstests-dev/results//xfs_rt/generic/003.out.bad) --- tests/generic/003.out 2014-02-24 09:58:09.505184325 +1100 +++ /home/dave/src/xfstests-dev/results//xfs_rt/generic/003.out.bad 2017-09-04 10:19:07.609694351 +1000 @@ -1,2 +1,27 @@ QA output created by 003 +./tests/generic/003: line 93: echo: write error: No space left on device +stat: cannot stat '/mnt/scratch/dir1/file1': Structure needs cleaning +ERROR: access time has changed for file1 after remount +ERROR: modify time has changed for file1 after remount +ERROR: change time has changed for file1 after remount +./tests/generic/003: line 120: echo: write error: No space left on device ... (Run 'diff -u tests/generic/003.out /home/dave/src/xfstests-dev/results//xfs_rt/generic/003.out.bad' to see the entire diff) _check_xfs_filesystem: filesystem on /dev/pmem1 is inconsistent (r) (see /home/dave/src/xfstests-dev/results//xfs_rt/generic/003.full for details) _check_dmesg: something found in dmesg (see /home/dave/src/xfstests-dev/results//xfs_rt/generic/003.dmesg) [352996.421261] run fstests generic/003 at 2017-09-04 10:18:57 [352996.669490] XFS (pmem1): Unmounting Filesystem [352996.714422] XFS (pmem1): Mounting V5 Filesystem [352996.718122] XFS (pmem1): Ending clean mount [352996.745512] XFS (pmem1): Unmounting Filesystem [352996.780789] XFS (pmem1): Mounting V5 Filesystem [352996.783980] XFS (pmem1): Ending clean mount [352998.825234] XFS (pmem1): Unmounting Filesystem [352998.839376] XFS (pmem1): Mounting V5 Filesystem [352998.842762] XFS (pmem1): Ending clean mount [352998.847718] XFS (pmem1): corrupt dinode 100, has realtime flag set. [352998.848716] ffff88013b348800: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. [352998.851393] ffff88013b348810: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ [352998.852738] ffff88013b348820: 59 ac 9b f2 2e cf 4e 87 59 ac 9b f1 2e 91 a7 2b Y.....N.Y......+ [352998.854168] ffff88013b348830: 59 ac 9b f1 2e 91 a7 2b 00 00 00 00 00 00 00 00 Y......+........ [352998.855514] XFS (pmem1): Internal error xfs_iformat(realtime) at line 94 of file fs/xfs/libxfs/xfs_inode_fork.c. Caller xfs_iread+0x1cf/0x230 [352998.857637] CPU: 3 PID: 7470 Comm: stat Tainted: G W 4.13.0-rc7-dgc #45 [352998.858833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [352998.860092] Call Trace: [352998.860492] dump_stack+0x63/0x8f [352998.861052] xfs_corruption_error+0x87/0x90 [352998.861711] ? xfs_iread+0x1cf/0x230 [352998.862270] xfs_iformat_fork+0x390/0x690 [352998.862896] ? xfs_iread+0x1cf/0x230 [352998.863454] ? xfs_inode_from_disk+0x35/0x230 [352998.864132] xfs_iread+0x1cf/0x230 [352998.864672] xfs_iget+0x518/0xa40 [352998.865221] xfs_lookup+0xd6/0x100 [352998.865755] xfs_vn_lookup+0x4c/0x90 [352998.866316] lookup_slow+0x96/0x150 [352998.866860] walk_component+0x19a/0x330 [352998.867454] ? path_init+0x1dc/0x330 [352998.868011] path_lookupat+0x64/0x1f0 [352998.868581] filename_lookup+0xa9/0x170 [352998.869192] ? filemap_map_pages+0x152/0x290 [352998.869853] user_path_at_empty+0x36/0x40 [352998.870474] ? user_path_at_empty+0x36/0x40 [352998.871130] vfs_statx+0x67/0xc0 [352998.871635] SYSC_newlstat+0x2e/0x50 [352998.872200] ? trace_do_page_fault+0x41/0x140 [352998.872871] SyS_newlstat+0xe/0x10 [352998.873423] entry_SYSCALL_64_fastpath+0x1a/0xa5 [352998.874140] RIP: 0033:0x7f75730690e5 [352998.874699] RSP: 002b:00007ffdcad5e878 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 [352998.875856] RAX: ffffffffffffffda RBX: 00007ffdcad5ea68 RCX: 00007f75730690e5 [352998.876975] RDX: 00007ffdcad5e8b0 RSI: 00007ffdcad5e8b0 RDI: 00007ffdcad5fc9a [352998.878072] RBP: 0000000000000004 R08: 0000000000000100 R09: 0000000000000000 [352998.879154] R10: 00000000000001cb R11: 0000000000000246 R12: 000056423451cc80 [352998.880233] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [352998.881548] XFS (pmem1): Corruption detected. Unmount and run xfs_repair [352998.882581] XFS (pmem1): xfs_iread: xfs_iformat() returned error -117 The second blowup is: generic/015 1s ... [failed, exit status 1] - output mismatch (see /home/dave/src/xfstests-dev/results//xfs_rt/generic/015.out.bad) --- tests/generic/015.out 2014-01-20 16:57:33.965658221 +1100 +++ /home/dave/src/xfstests-dev/results//xfs_rt/generic/015.out.bad 2017-09-04 10:19:17.998113907 +1000 @@ -2,6 +2,5 @@ fill disk: !!! disk full (expected) check free space: -delete fill: -check free space: - !!! free space is in range + *** file created with zero length ... (Run 'diff -u tests/generic/015.out /home/dave/src/xfstests-dev/results//xfs_rt/generic/015.out.bad' to see the entire diff) _check_xfs_filesystem: filesystem on /dev/pmem1 is inconsistent (r) (see /home/dave/src/xfstests-dev/results//xfs_rt/generic/015.full for details) Which may or may not be a xfstests problem, because repair blows up with: ..... inode 96 has RT flag set but there is no RT device inode 99 has RT flag set but there is no RT device inode 96 has RT flag set but there is no RT device would fix bad flags. inode 99 has RT flag set but there is no RT device would fix bad flags. found inode 99 claiming to be a real-time file ..... And the third is: [353017.737976] run fstests generic/018 at 2017-09-04 10:19:18 [353017.956902] XFS (pmem1): Mounting V5 Filesystem [353017.960672] XFS (pmem1): Ending clean mount [353017.982836] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [353017.984077] IP: xfs_find_bdev_for_inode+0x2b/0x30 [353017.984873] PGD 0 [353017.984874] P4D 0 [353017.985788] Oops: 0000 [#1] PREEMPT SMP [353017.986412] CPU: 9 PID: 15847 Comm: xfs_io Tainted: G W 4.13.0-rc7-dgc #45 [353017.987641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [353017.988932] task: ffff880236955740 task.stack: ffffc90007878000 [353017.989853] RIP: 0010:xfs_find_bdev_for_inode+0x2b/0x30 [353017.990666] RSP: 0018:ffffc9000787bc88 EFLAGS: 00010202 [353017.991466] RAX: 0000000000000000 RBX: ffffc9000787bd70 RCX: 000000000000000c [353017.992584] RDX: 0000000000000001 RSI: fffffffffffffffe RDI: ffff8808280891e8 [353017.993657] RBP: ffffc9000787bcb0 R08: 0000000000000009 R09: ffff8808280890c8 [353017.994726] R10: 000000000000034e R11: ffff880236955740 R12: ffff880828089080 [353017.995808] R13: ffffc9000787bd08 R14: ffff88080a8de000 R15: ffff88080a8de000 [353017.996905] FS: 00007ff336cb21c0(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000 [353017.998114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [353017.998984] CR2: 0000000000000008 CR3: 000000022dc09000 CR4: 00000000000406e0 [353018.000049] Call Trace: [353018.000465] ? xfs_bmbt_to_iomap+0x78/0xb0 [353018.001097] xfs_file_iomap_begin+0x265/0x990 [353018.001770] iomap_apply+0x48/0xe0 [353018.002300] ? iomap_write_end+0x70/0x70 [353018.002909] iomap_fiemap+0x9e/0x100 [353018.003471] ? iomap_write_end+0x70/0x70 [353018.004085] xfs_vn_fiemap+0x5c/0x80 [353018.004668] do_vfs_ioctl+0x450/0x5c0 [353018.005233] SyS_ioctl+0x79/0x90 [353018.005735] entry_SYSCALL_64_fastpath+0x1a/0xa5 [353018.006440] RIP: 0033:0x7ff336390dc7 [353018.007000] RSP: 002b:00007fff1b806b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [353018.008154] RAX: ffffffffffffffda RBX: 0000000000000063 RCX: 00007ff336390dc7 [353018.009241] RDX: 0000558a334476a0 RSI: 00000000c020660b RDI: 0000000000000003 [353018.010314] RBP: 0000000000002710 R08: 0000000000000003 R09: 000000000000001d [353018.011396] R10: 000000000000034e R11: 0000000000000246 R12: 0000000000001010 [353018.012479] R13: 00007ff336647b58 R14: 0000558a33447dc0 R15: 00007ff336647b00 [353018.013554] Code: 66 66 66 66 90 f6 47 da 01 55 48 89 e5 48 8b 87 98 fe ff ff 75 0d 48 8b 80 38 02 00 00 5d 48 8b 40 08 c3 48 8b 80 48 02 00 00 5d <48> 8b 40 08 c3 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 [353018.016404] RIP: xfs_find_bdev_for_inode+0x2b/0x30 RSP: ffffc9000787bc88 [353018.017420] CR2: 0000000000000008 [353018.018024] ---[ end trace af08c2af09ff5975 ]--- A null pointer dereference in generic/018. At which point the system needs rebooting to recover. So, yeah, the rtdev is not stable, not robust and not very well maintained at this point. If you want to focus new development on the RT device, then the first thing we need is fixes for all it's obvious problems. Get it working reliably upstream first so we have a good baseline from which we can evaluate enhancements sanely... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html