Hey Guys, In my testing I have encountered a problem while trying to create a xfs filesystem on a 4 disk Intel SSD Raid5 volume. I am currently testing with a Fedora 3.11.4-201 kernel and the latest version of the mdadm binary(3.2.6-21). I realize that userspace mdadm binary may not factor into the equation and the error may be all mdraid5 kernel module driven. The basic kernel error is encountered when creating a 4 drive RAID5 volume and writing an XFS filesystem to it after it has resync’d. The quick commands used are as follows: 1. Check to make sure the Raid 5 volume is resync’d: #cat /proc/mdstat 2. Change from init5 to init3, eliminating the X-windowing system, allowing for more explicit kernel error output: #init 3 3. Build xfs filesystem on the sync’d raid5 volume: # mkfs.xfs /dev/md5 After executing the mkfs.xfs command, the standard out will instantly start to spew blk & mdraid5 errors. Ending after 2 rounds of errors with a cpu soft lockup message: Kernel: [ 1016.781678] BUG: soft lockup ? CPU#1 stuck for 22s! [md5_raid:463] The kernel panic error log is posted at: http://pastebin.com/EbGYti24 Detailed Configuration Setup: This configuration requires 4 SATA2 ports to be present on the same controller. Test Environment: Storage Media: Intel SSD 710 Series 200GB x 4 Disks OS: Fedora 19 x86_64 Steps to recreate: 1. HW Setup: a. Install 2 x 2GB DDR3 UDIMMs on the last 2 slots farthest from the CPU (one memory channel). b. Connect all 4 Intel 710 200GB SSDs to SATA2 ports located on the one ahci sata controller. I have tried Intel 320 SSDs with the same result. c. Insert a 8GB USBkey in to one of the USB slots and Install Fedora OS. Use a standard partitioning system and limit swap to 3GB. In my experience this is assigned to the /dev/sde device. 2. After the OS has been installed on the platform , reboot system. 3. When system is back up, login as root and use fdisk to create a 50 GB partition on each disk. Assign the partition type to be “fd” which is the “linux RAID auto-detect” partition type for every partition created. 4. Use mdadm binary to create a RAID 5 partition using the 4 50GB linux RAID auto-detect that were just created. Use a command like this: #mdadm --create /dev/md5 --level=5 --raid-devices = 4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 5. Once the RAID5 volume has been created, attempt to make a XFS file system on the new volume. This can be accomplished using this command; # mkfs.xfs /dev/md5 6. After about 5 seconds there the system will start indicating kernel errors and 5 seconds after that the system will freeze. If one is running this experiment in the init 3 multi-user mode , the system will spew kernel errors until it eventually freezes. The error should look like this: Kernel: [ 1016.781678] BUG: soft lockup ? CPU#1 stuck for 22s! [md5_raid:463] Basic experiments executed to narrow the scope of issue: #Is the problem persistent on all local media types? I tried this same setup on 4 enterprise 3G HDD disks with no problems or issues. I was able to create a 4 disk RAID5 array and write an XFS filesystem to the RAID array. Then I was able to write, append and read a file to the XFS filesystem on the RAID array . #Does it fail if you create a different file system type. (ext3)? Yes, I have tried this and the failure still occurs. #Does it fail if you create a XFS file system on a non RAID AVN SATA disk? Create XFS on a single disk. No failure. I was able to create a partition, write a xfs filesystem, create a file, write the file, read the file, and delete the file. #Is the RAID5 volume fully initialized when you create the XFS file system. you can check it with cat /proc/mdstat. it will show a percentage of how far its done. Yes , After creating the RAID5 volume, I run the ‘#watch cat /proc/mdstat’command to watch the volume go through its sync process. #Can you try to do that same without using AVN SATA drives. Use 4 RAM disks instead of real drives and do that same test. That way we can rule out AVN SATA. Use /dev/ram0 etc... you have to specify the ram disk size in grub. I checked this out: o loading the brd module o resized each /dev/ram[?] device to 8192 (8M) o Created /dev/md100 using mdadm RAID5 and /dev/ram[0-3] o Executed #mkfs.xfs /dev/md100 with no errors. I think this issue is MD5 RAID kernel module and SSD timing related and this test proves that there is nothing wrong with the mdadm binary in relation to Atom processor originally being tested. Experiments tried to verify that this a kernel issue: Below are all of the experiments trying to create a xfs filesystem on RAID5 resulted in the same kernel panic error: - Moved OS USB key and SSDs to an older generation atom storage platform, failed. - Connected the SSDs to a LSI card, failed. - Updated the kernel to the latest kernel (3.11.4-201), failed. - Updated the mdadm binary(3.2.6-21) to the latest version, failed. When moving the OS and SSDs to the order Atom generation which did not have these problems before, the problem followed, indicating that the issue was kernel or MDRAID related. Regards, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html