On Wed, Nov 27, 2013 at 08:39:55PM -0600, Eric Sandeen wrote: > On 11/26/13, 8:47 PM, Dave Chinner wrote: > > On Tue, Nov 26, 2013 at 06:31:19PM -0800, Phil White wrote: > >> Gents: > >> > >> I was making an image for a VM using everyone's favorite fs with a line > >> that looked something like this: > >> ------------- > >> dd if=/dev/zero of=~/image bs=1024 count=1048576 && ./mkfs/mkfs.xfs && mount -o loop ~/image /mnt/loop > >> ------------- > >> > >> > >> mkfs.xfs gave me this output: > >> ------------- > >> meta-data=/root/image isize=256 agcount=4, agsize=65536 blks > >> = sectsz=512 attr=2, projid32bit=0 > >> data = bsize=4096 blocks=262144, imaxpct=25 > >> = sunit=0 swidth=0 blks > >> naming =version 2 bsize=4096 ascii-ci=0 > >> log =internal log bsize=4096 blocks=2560, version=2 > >> = sectsz=512 sunit=0 blks, lazy-count=1 > >> realtime =none extsz=4096 blocks=0, rtextents=0 > >> existing superblock read failed: Invalid argument > >> mkfs.xfs: pwrite64 failed: Invalid argument > >> mkfs.xfs: read failed: Invalid argument > >> ------------- > > ..... > >> > >> While it occurred to me that the problem might just be line 806 of some files > >> in xfsprogs, I threw it under a debugger and took a closer look. The file > >> descriptor value in xi->dfd pointed at ~/image. errno was set to 22. I > >> thought that might indicate a problem with lseek(), so I rewrote the pwrite64() > >> and pread() as lseek()s and read()/write() > >> > >> As you may have guessed, this did me no good at all. > >> > >> It's trying to read/write 512 bytes at the beginning of the file which seems > >> reasonably innocuous. I double checked the man page which says that under > >> 2.6, O_DIRECT writes can be aligned to 512 bytes without a problem. > > > > That doesn't mean it is correct, because the man page also says: > > > > " In Linux alignment restrictions vary by filesystem and kernel > > version and might be absent entirely." > > > > So, I bet that your underlying filesystem (i.e. the host filesystem) > > has a sector size of 4k, and that's why direct Io on 512 byte > > alignment is failing. In that case, run "mkfs.xfs -s size=4k ..." > > and mkfs should just work fine... > > Sadly, no. Or at least, probably not. > > __initbuf > memalign(libxfs_device_alignment(), bytes); > > where libxfs_device_alignment() does: Yeah, that's for memory buffer alignment, though, not IO alignment. It's busted because that should always default to page size, not sector size. But that's not the problem - for example: # xfs_info /storage meta-data=/dev/md0 isize=256 agcount=32, agsize=21503744 blks = sectsz=4096 attr=2, projid32bit=0 = crc=0 data = bsize=4096 blocks=688119680, imaxpct=5 = sunit=32 swidth=320 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=335995, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 That's a 4k sector filesystem, and: # dd if=/dev/zero of=/storage/fubar.img bs=1024 count=1048576 && mkfs.xfs -d file,size=1g,name=/storage/fubar.img 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB) copied, 4.18106 s, 257 MB/s meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # mkfs works fine on it. As does xfs_repair: # xfs_repair -f /storage/fubar.img Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... And xfs_db works just fine, too: $ sudo xfs_db -f /storage/fubar.img xfs_db> sb 0 xfs_db> p magicnum = 0x58465342 blocksize = 4096 dblocks = 262144 rblocks = 0 rextents = 0 uuid = 73d16c96-df35-4f1f-b781-34da486f089c logstart = 131076 rootino = 128 rbmino = 129 .... because it doesn't set the LIBXFS_DIRECT flag on the device instantiation structures yet and so is using buffered IO. > IOWS: xfsprogs is a braindead package that doesn't know how to > properly handle non-512-aligned DIO. ;) </snark> Yeah, it doesn't know how to handle it but it avoids the problem completely by using buffered IO instead. It works just fine. ;) So, let's recreate the problem knowing that: $ sudo dd if=/dev/zero of=/storage/fubar.img bs=1024 count=1048576 && sudo strace -f -o t.t mkfs.xfs -d size=1g,name=/storage/fubar.img 1048576+0 records in 1048576+0 records out 1073741824 bytes (1.1 GB) copied, 4.52546 s, 237 MB/s meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument So, it failed to write using direct IO because of IO alignment because I didn't tell mkfs that it was running on a file. i.e. I forgot the "-d file" option. $ sudo mkfs.xfs -d size=1g,name=/storage/fubar.img meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument Yup, still fails. Let's force it! $ sudo mkfs.xfs -f -d size=1g,name=/storage/fubar.img meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 existing superblock read failed: Invalid argument mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument And there's the identical failure to what was reported. So, user error - the user is telling mkfs.xfs that it is making a filesystem on a block device named "/storage/fubar.img". The same thing happens with the normal method of specifying the block device: sudo mkfs.xfs -f -d size=1g /storage/fubar.img meta-data=/storage/fubar.img isize=256 agcount=4, agsize=65536 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=7344, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 existing superblock read failed: Invalid argument mkfs.xfs: pwrite64 failed: Invalid argument mkfs.xfs: read failed: Invalid argument But if we remove the image file: $ sudo mkfs.xfs -f -d size=1g /storage/fubar.img /storage/fubar.img: No such file or directory Usage: mkfs.xfs .... It's pretty clear that we need the "-d file" when the file doesn't actually exist. IOWs, mkfs does not expect a block device to lie about it's sector sizes, but that's exactly what treating an image file like a block device leads to. This isn't the DIO sector size problem you were looking for, Eric ;) FWIW, an strace shows: 12256 ioctl(3, BLKDISCARD, 0x7fff76f4ea50) = -1 ENOTTY (Inappropriate ioctl for device) ... that we make that same mistake in several places in mkfs. What mkfs needs to do is reject devices that are files when "-d file", "-l file" and "-r file" is not specified, and the problem will go away because it will catch users who forget to tell mkfs that it is supposed to be operating on an image file... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs