Custom driver FS brokenness at 4GB?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings. I have an odd issue and need some ideas of where to go next -- I'm out of hair to rip out.

I'm writing a custom block device driver talking to some custom RAID hardware (>32TB) using DMA scatter-gather, with no partitions and am using make_request() to service all the BIO requests to simplify debugging. I have the driver working to the point where using DD against the block device seems to work fine (I'm setting iflag|oflag=direct to ensure it's writing to the disk). I also have the blk_queue set to only request a single 4k I/O per BIO (again to simplify debugging for now.) Also, again to debug, I have a mutex wrapping the entire make_request call to ensure that only a single request is being serviced at a time. So, this should be as "simple" as I can make the environment to debug this problem.

Once the driver is loaded, when I try to create a file system (ext4 but the same thing happens with xfs) it seems like there is some corruption occurring, but only when I set the sector size of the block device over 4GB. For instance, when I set the size to 4G, I can mkfs.ext4, but after 2 or 3 mount/umounts the FS refuses to mount anymore and the kernel log complains that the journal is missing. This was discovered running this loop...

#!/bin/sh
COUNT=4032

while [ 1 ] ; do

figlet ${COUNT}

( umount /mnt ; rmmod smc ) || true
modprobe smc capacity_in_mb=${COUNT} debug=1
mkfs.ext4 -m 0 /dev/smcd

mount /dev/smcd /mnt
cp count_512m.dat /mnt/test
umount /mnt
mount /dev/smcd /mnt
umount /mnt
mount /dev/smcd /mnt
cmp count_512m.dat /mnt/test
umount /mnt
mount /dev/smcd /mnt # ***
sync
umount /mnt
mount /dev/smcd /mnt
sleep 1
umount /mnt

COUNT=$(( COUNT + 64 ))
sleep 1

done

Sometimes I'll get in the kernel log:
May 27 09:39:01 febtober kernel: [64547.304695] EXT4-fs (smcd): ext4_check_descriptors: Checksum for group 0 failed (7009!=0) May 27 09:39:01 febtober kernel: [64547.305744] EXT4-fs (smcd): group descriptors corrupted!

Others I'll get:
May 27 09:46:49 ryftone-smcdrv kernel: [65014.342850] EXT4-fs (smcd): no journal found


I've seen this loop fail as early as COUNT=4096, but as late as COUNT=4220; removing the sync changes the behavior.
When it fails, it usually does so on the 3rd mount (***).
FYI, I effectively call: set_capacity( disk, capacity_in_mb * 2048 ); ( 2048 * 512b (kernel sector) = 1M )

Another example: if I set the sector count of the disk to 16G, I can run mkfs.ext4 but the first mount fails and I see May 27 09:07:27 febtober kernel: [62653.269387] EXT4-fs (smcd): ext4_check_descriptors: Block bitmap for group 0 not in group (block 4294967295)!

But, again, if I set the sector size < 4G, everything seems fine. I can currently DD read and write across that 4G boundary without issue -- it's ONLY the filesystem accesses. My gut is screaming there's 32/64 bit overflow condition somewhere but for the life of me I can't find it. Is there something I need to set to tell the block layer I have a 64-bit addressible device? set_capacity is always the number of LINUX KERNEL sectors (not what I set blk_queue_logical|physical_block_size to) correct?

I'm currently on 3.16.0 (Ubuntu 14.04.2 LTS) if it matters.

Any help/pointers would be greatly appreciated.

--Rob Harris

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux