On Sat, Sep 25, 2010 at 8:56 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: >> So once again. We have created a RAID unit level 6. On the top of >> the unit there is an LVM architecture, I mean a volume group that >> contains logical volumes. The logical volume is formatted with XFS >> and it contains one big file that takes almost all of the space on >> the LV. There is some free space left in order to be able expand the >> LV and FS in the future. The LV is mounted and the file is served as >> iSCSI target. The iSCSI Initiator (MS Initiator from Windows 2k3) >> connects to iSCSI target. The iSCSI disk is formatted with the NTFS. > > ok, so we have: > > Linux Server > > +----------------------+ > | hardware raid 6 | > +----------------------+ > | lvm2 - linear volume | > +----------------------+ > | XFS | > +----------------------+ > | iSCSI target | > +----------------------+ > > Windows client: > > > +----------------------+ > | iSCSI initiator | > +----------------------+ > | NTFS | > +----------------------+ > >> But we believe the problem is with the XFS. With unknown reason we >> are not able to mount the LV and after running xfs_repair the file >> is missing from the LV. Do you have any ideas how we can try to fix >> the broken XFS? > > This does not sound like a plain XFS issue to me, but an interaction > between components going completely wrong. Normal I/O to a file > should never corrupt the filesystem around it to the point where > it's unusable, and so far I never heard reports about that. The hint > that this doesn't happen with another purely userspace target is > interesting. I wonder if SCST that you use does any sort of in-kernel > block I/O after using bmap or similar? I've not seen that for iscsi > targets yet but for other kernel modules, and that kind of I/O > can cause massive corruption on a filesystem with delayed allocation > and unwritten extents. > > Can any of the SCST experts on the list here track down how I/O for this > configuration will be issued? > > What does happen if you try the same setup with say jfs or ext4 instead > of xfs? I saw references to vdisk fileio in there and wondered why this was being done rather than simply exporting the hardware raid 6 device? Ie, why are all those other layers in there? fileio uses submit_bio to submit the data and it defaults to WRITE_THROUGH, NV_CACHE and DIRECT_IO (at least in the trunk, but I suspect this has been the case for a long while) however, the person making the complaint might have switched off WRITE_THROUGH in the pursuit of performance, in which case a crash could corrupt things badly but it would depend on whether or not clearing WRITE_THROUGH also clears NV_CACHE and what the code assembling the caching mode page does (and I have only had a cursory glance at the vdisk code). What is needed here is the parameters used in configuring the vdisk and the version of SCST in use. -- Regards, Richard Sharpe _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs