On Tue, Oct 27, 2015 at 11:51:35PM +0100, Michael Weissenbacher wrote: > Hi Dave! > First of all, today i cancelled the running xfs_repair (CTRL-C) and > upped the system RAM from 8GB to 16GB - the maximum possible with this > hardware. > > Dave Chinner wrote: > > It's waiting on inode IO to complete in memory reclaim. I'd say you > > have a problem with lots of dirty inodes in memory and very slow > > writeback due to using something like RAID5/6 (this can be > > *seriously* slow as mentioned recently here: > > http://oss.sgi.com/archives/xfs/2015-10/msg00560.html). > Unfortunately, this is a rather slow RAID-6 setup with 7200RPM disks. > However, before the power loss occurred it performed quite OK for our > use case and without any hiccups. But some time after the power loss > some "rm" commands hung and didn't proceed at all. There was no CPU > usage and there was hardly any I/O on the file system. That's why I > suspected some sort of corruption. Maybe you have a disk that is dying. Do your drives have TLER enabled on them? > Dave Chinner wrote: > > Was it (xfs_repair) making progress, just burning CPU, or was it just hung? > > Attaching the actual output of repair is also helpful, as are all > > the things here: > > ... > The xfs_repair seemed to be making progress, albeit very very slowly. In > iotop i saw about 99% I/O usage on kswapd0. Looking at the HDD LED's of > the array, i could see that there was hardly any access to it at all > (only once about every 10-15 seconds). kswapd is tryingto reclaim kernel memory, which has nothing directly to do with xfs_repair IO or cpu usage. Unless, of course, it is trying to do reclaim for grab more memory for xfs_repair... > I didn't include xfs_repair output, since it showed nothing unusual. > ---snip--- > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > ... > - agno = 14 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > ... > - agno = 14 > Phase 5 - rebuild AG headers and trees... > - reset superblock... > Phase 6 - check inode connectivity... > - resetting contents of realtime bitmap and summary inodes > - traversing filesystem ... > ---snip--- > (and sitting there for about 72 hours) It really hasn't made much progress if it's still traversing the fs after 72 hours. > Dave Chinner wrote: > > If repair is swapping, then adding more RAM and/or faster swap space > > will help. There is nothing that you can tweak that changes the > > runtime or behaviour of phase 6 - it is single threaded and requires > > traversal of the entire filesystem directory heirarchy to find all > > the disconnected inodes so they can be moved to lost+found. And it > > does write inodes, so if you have a slow SATA RAID5/6... > Ok, so if i understand you correctly, none of the parameters will help > for phase 6? I know that RAID-6 has slow write characteristics. But in > fact I didn't see any writes at all with iotop and iostat. If kswapd is doing all the work, then it's essentially got no memory available. I would add significantly more swap space as well (e.g. add swap files to the root filesystem - you can do this while repair is running, too). If there's sufficient swap space, then repair should use it fairly efficiently - it doesn't tend to thrash swap because most of it's memory usage is for information that is only accessed once per phase or is parked until it is needed in a later phase so it doesn't need to be read from disk again... > Dave Chinner wrote: > > > > See above. Those numbers don't include reclaimable memory like the > > buffer cache footprint, which is affected by bhash and concurrency.... > > > As said above, i did now double the RAM of the machine from 8GB to 16GB. > Now I started xfs_repair again with the following options. I hope that > the verbose output will help to understand better what's actually going on. > # xfs_repair -m 8192 -vv /dev/sdb1 > > Besides, is it wise to limit the memory with "-m" to keep the system > from swapping or should I be better using the defaults (which would use > 75% of RAM)? Defaults, but it's really only a guideline for cache sizing. If repair needs more memory to store metadata it is validating (like the directory structure) then it will consume as much as it needs. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs