Re: xfs_repair segfaults with ag_stride option

Eric Sandeen <sandeen@xxxxxxxxxxx> · Tue, 07 Feb 2012 12:00:20 -0600

On 2/7/12 11:41 AM, Tom Crane wrote:
> Eric Sandeen wrote:
>> On 2/6/12 5:19 AM, Tom Crane wrote:
>>  
>>> Eric Sandeen wrote:
>>>     
>>
>> ...
>>
>>  
>>>> Newer tools are fine to use on older filesystems, there should be no
>>>>         
>>> Good!
>>>
>>>    
>>>> issue there.
>>>>
>>>> running fsr can cause an awful lot of IO, and a lot of file reorganization.
>>>> (meaning, they will get moved to new locations on disk, etc).
>>>>
>>>> How bad is it, really?  How did you arrive at the 40% number?  Unless
>>>>         
>>> xfs_db -c frag -r <block device>
>>>     
>>
>> which does:
>>
>>                 answer = (double)(extcount_actual - extcount_ideal) * 100.0 /
>>                          (double)extcount_actual;
>>
>> If you work it out, if every file was split into only 2 extents, you'd have
>> "50%" - and really, that's not bad.  40% is even less bad.
>>   
> 
> Here is a list of some of the more fragmented files, produced using,
> xfs_db -r /dev/mapper/vg0-lvol0 -c "frag -v" | head -1000000 | sort -k4,4 -g | tail -100
> 
>> inode 1323681 actual 12496 ideal 2

ok, so that's a fair number of extents, although I don't know how big the file is.

I think "Frag" takes into account sparseness, so that doesn't account for it.
(i.e. frag on a sparse file w/ 5 filled in regions yields "actual 5, ideal 5")

> The following for some of the larger, more fragmented files was produced by parsing/summarising the output of bmap -l
> 
>> (nos-extents size-of-smallest-extent size-of-largest-extent size-of-average-extent)
>> 20996 8 38232 370.678986473614

So about a 3G file in 20996 extents.  Not great (unless it's sparse?)

> How bad does this look?

Ok... not great?  :)  If it is really scattered around the disk that might impact how quickly you can read them after all.

How are the files created, you might want to try to fix it up on that end, as well.

-Eric

> Cheers
> Tom.
> 
> 
>>> Some users on our compute farm with large jobs (lots of I/O) find they take longer than with some of our other scratch arrays hosted on other machines.  We also typically find many nfsd tasks in an uninterruptible wait state (sync_page), waiting for data to be copied in from the FS.
>>>     
>>
>> So fragmentation may not be the problem...
>> -Eric
>>
>>  
>>>> you see perf problems which you know you can attribute to fragmentation,
>>>> I might not worry about it.
>>>>
>>>> You can also check the fragmentation of individual files with the
>>>> xfs_bmap tool.
>>>>
>>>> -Eric
>>>>         
>>> Thanks for your advice.
>>> Cheers
>>> Tom.
>>>
>>>    
>>>>  
>>>>      
>>>>> Tom.
>>>>>
>>>>> Christoph Hellwig wrote:
>>>>>           
>>>>>> Hi Tom,
>>>>>>
>>>>>> On Wed, Feb 01, 2012 at 01:36:12PM +0000, Tom Crane wrote:
>>>>>>  
>>>>>>               
>>>>>>> Dear XFS Support,
>>>>>>>    I am attempting to use xfs_repair to fix a damaged FS but always
>>>>>>> get a segfault if and only if -o ag_stride is specified. I have
>>>>>>> tried ag_stride=2,8,16 & 32.  The FS is approx 60T. I can't find
>>>>>>> reports of this particular problem on the mailing list archive.
>>>>>>> Further details are;
>>>>>>>
>>>>>>> xfs_repair version 3.1.7, recently downloaded via git repository.
>>>>>>> uname -a
>>>>>>> Linux store3 2.6.18-274.17.1.el5 #1 SMP Wed Jan 11 11:10:32 CET 2012
>>>>>>> x86_64 x86_64 x86_64 GNU/Linux
>>>>>>>                         
>>>>>> Thanks for the detailed bug report.
>>>>>>
>>>>>> Can you please try the attached patch?
>>>>>>
>>>>>>                   
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@xxxxxxxxxxx
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>>             
>>>>         
>>
>>   
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs