Re: HELP! Diapered block device

Jon Scottorn <jscottorn@xxxxxxxxxxxxxxxxxxxx> · Thu, 07 Jul 2005 10:17:46 -0600

Ok, so I am rerunning gfs_fsck again.  I have everything unmounted and
the storage server not even in the cluster.  It has been running now for
30 mins again and it gets to this point and just doesn't look like it is
doing anything.  This is where is stayed after running it for 24 hours
yesterday:  Here is what the verbose output from gfs_fsck:

Initializing fsck
Initializing lists...
Initializing special inodes...
Setting block ranges...
Creating a block list of size 183146926...
Clearing journals (this may take a while)
Clearing journal 0
Clearing journal 1
Clearing journal 2
Clearing journal 3
Cleared journals
Starting pass1
Checking metadata in Resource Group 0
Checking metadata in Resource Group 1
Checking metadata in Resource Group 2
Checking metadata in Resource Group 3

........Omitted lines for space.........................

Checking metadata in Resource Group 2790
Checking metadata in Resource Group 2791
Checking metadata in Resource Group 2792
Checking metadata in Resource Group 2793
Pass1 complete
Starting pass1b
Looking for duplicate blocks...
Found dup block at 61573000
Found dup block at 61573014
Found dup block at 61573015
Found dup block at 61573016
Found dup block at 61573017
Found dup block at 61573018
Found dup block at 61573019
Found dup block at 61573020
Found dup block at 61573021
Found dup block at 61573022
Found dup block at 61573024
Found dup block at 61573047
Found dup block at 61573048
Found dup block at 61573052
Found dup block at 61623032
Found dup block at 61623033
Found dup block at 61623034
Found dup block at 61623035
Scanning filesystem for inodes containing duplicate blocks...

Once it gets to this point it just sits there.  gfs_fsck is using 99% of
the CPU for the whole time it runs.  What else can I do to get this fixed?

Thanks,

Jon

Jon Scottorn wrote:

>When I ran the fsck, i had everything unmounted as well as the gnbd serv
>stopped.  I let it run for almost 24 hours and it was still running. 
>That seems a little long for me.  Should I let it run again and see what
>happens.  My main problem is I can't have the FS down for that long. 
>
>Thanks,
>
>Jon
>
>AJ Lewis wrote:
>
>  
>
>>On Thu, Jul 07, 2005 at 09:16:45AM -0600, Jon Scottorn wrote:
>> 
>>
>>    
>>
>>>Thanks,
>>>
>>>   That made it so I can mount it from the other nodes, but now I can't
>>>mount it on the storage server.
>>>   
>>>
>>>      
>>>
>>Gah!  Is the fsck still running?  You *CANNOT* run the fsck while other nodes
>>have the fs mounted.  The fsck changes the lock protocol to prevent others
>>    
>>
>>from mounting after the fsck starts.  It will be changed back after
>  
>
>>completion.
>>
>>The fsck can take a while in the duplicate block code - could you tell if it
>>was still accessing storage?  If you have lots of inodes in the system, it's
>>gonna take a while to work through them in the dup block handling code.
>>
>>Regards,
>> 
>>
>>------------------------------------------------------------------------
>>
>>--
>>
>>Linux-cluster@xxxxxxxxxx
>>http://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>    
>>
>
>--
>
>Linux-cluster@xxxxxxxxxx
>http://www.redhat.com/mailman/listinfo/linux-cluster
>
>  
>

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster