Re: [xfs_check Out of memory: ]

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 30 Dec 2013 10:27:19 +1100

On Sun, Dec 29, 2013 at 12:57:13PM +0100, Arkadiusz Miśkiewicz wrote:
> On Sunday 29 of December 2013, Dave Chinner wrote:
> > On Sat, Dec 28, 2013 at 12:20:39AM +0100, Arkadiusz Miśkiewicz wrote:
> > > On Friday 27 of December 2013, Dave Chinner wrote:
> > > > On Fri, Dec 27, 2013 at 09:07:22AM +0100, Arkadiusz Miśkiewicz wrote:
> > > > > On Friday 27 of December 2013, Jeff Liu wrote:
> > > > > > On 12/27 2013 14:48 PM, Stor?? wrote:
> [...]
> > > > > This reminds me a question...
> > > > > 
> > > > > Could xfs_repair store its temporary data (some of that data, the
> > > > > biggest parte) on disk instead of in memory?
> > > > 
> > > > Where on disk?
> > > 
> > > In directory/file that I'll tell it to use (since I usualy have few xfs
> > > filesystems on single server and so far only one at a time breaks).
> > 
> > How is that any different from just adding swap space to the server?
> 
> It's different by allowing other services to work while repair is in progress. 
> If swap gets eaten then entire server goes down on knees. Keeping thins on 
> disk would mean that other services work uninterrupted and repair gets slow 
> (but works).

Well, that depends on what disk you put the external db on. If that
is shared, then you're going to have problems with IO latency
causing service degradation....

> > > Right. So only "easy" task finding the one who understands the code and
> > > can write such interface left. Anyone?
> > > 
> > > IMO ram usage is a real problem for xfs_repair and there has to be some
> > > upstream solution other than "buy more" (and waste more) approach.
> > 
> > I think you are forgetting that developer time is *expensive* and
> > *scarce*.
> 
> I'm aware of that and not expecting any developer to implement this (unless 
> some developer hits the same problems and will have hw constrains ;)

The main issue here is that your filesystem usage is well outside
the 95th percentile, and so you are in the realm of custom solutions
that require significant engineering effort to resolve. That's not
to say they can't be solved, just that solving them is an expensive
undertaking...

> > This is essentially a solved problem: An SSD in a USB3 enclosure
> > as a temporary swap device is by far the most cost effective way
> > to make repair scale to arbitrary amounts of metadata.  It
> > certainly scales far better than developer time and testing
> > resources...
> 
> Ok.
> 
> I'm not saying that everyone should now start adding "on disk" db
> for xfs_repair. I just think that that soulution would work,
> regardless of hardware and would make it possible to repair huge
> filesystems (with tons of metadata) even on low memory machines
> (without having to change hardware).

It's always been the case that you can create a filesystem that a
specific machine does not have the resouces to be able to repair. We
can't prevent that from occurring. e.g. no amount of on-disk
database work will make repair complete on an embedded NAS box with
512MB of RAM, a 2GB system disk with a filesystem that spans 2x4TB
drives....

> If there is interest among developers to implement this (obiously
> not) is another matter and shouldn't matter on discussing
> approach.
> 
> What is more interesting for me is talking about possible problems
> with on disk approach and not looking for a solution to my
> particular case.

The problem with adding a database interface is that we have to
re-engineer all the internal structures that xfs_repair uses and the
indexes we use to track them. They need to be abstracted in a data
base friendly manner, and then new code has to be writen to manage
the database and insert/modify/remove the information in the
database.  Then there is work to find the most suitable database, as
simple key/value pair databases won't scale to tracking hundreds of
millions of records. That is likely to create significant
dependencies for xfsprogs, of which we can't pull into things like
the debian udeb builds which are used for building the recovery disk
images that contain xfs_repair. So we have to make it all build time
conditional, and then we'll have different capabilities from
xfs-repair depending on where you run it from. Then we've got to
test it all, document it, etc. 

And there's still no guarantee that is solves your problem. Not
enough disk space for the database? ENOSPC causes failure instead of
ENOMEM. How do we know how much disk space is needed? We can't
predict that exactly without running repair, same as for memory
usage prediction. And even if we are using a DB rather than RAM,
there's still the possibility of ENOMEM.

These are all solvable issues, but they take time and resources and
expertise we don't currently have to solve. When compared to the
simplicity of "add a usb SSD for swap", it just doesn't make sense
to spend time trying to solve this problem....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs