On Sun, Dec 29, 2013 at 12:57:13PM +0100, Arkadiusz Miśkiewicz wrote: > On Sunday 29 of December 2013, Dave Chinner wrote: > > On Sat, Dec 28, 2013 at 12:20:39AM +0100, Arkadiusz Miśkiewicz wrote: > > > On Friday 27 of December 2013, Dave Chinner wrote: > > > > On Fri, Dec 27, 2013 at 09:07:22AM +0100, Arkadiusz Miśkiewicz wrote: > > > > > On Friday 27 of December 2013, Jeff Liu wrote: > > > > > > On 12/27 2013 14:48 PM, Stor?? wrote: > [...] > > > > > This reminds me a question... > > > > > > > > > > Could xfs_repair store its temporary data (some of that data, the > > > > > biggest parte) on disk instead of in memory? > > > > > > > > Where on disk? > > > > > > In directory/file that I'll tell it to use (since I usualy have few xfs > > > filesystems on single server and so far only one at a time breaks). > > > > How is that any different from just adding swap space to the server? > > It's different by allowing other services to work while repair is in progress. > If swap gets eaten then entire server goes down on knees. Keeping thins on > disk would mean that other services work uninterrupted and repair gets slow > (but works). Well, that depends on what disk you put the external db on. If that is shared, then you're going to have problems with IO latency causing service degradation.... > > > Right. So only "easy" task finding the one who understands the code and > > > can write such interface left. Anyone? > > > > > > IMO ram usage is a real problem for xfs_repair and there has to be some > > > upstream solution other than "buy more" (and waste more) approach. > > > > I think you are forgetting that developer time is *expensive* and > > *scarce*. > > I'm aware of that and not expecting any developer to implement this (unless > some developer hits the same problems and will have hw constrains ;) The main issue here is that your filesystem usage is well outside the 95th percentile, and so you are in the realm of custom solutions that require significant engineering effort to resolve. That's not to say they can't be solved, just that solving them is an expensive undertaking... > > This is essentially a solved problem: An SSD in a USB3 enclosure > > as a temporary swap device is by far the most cost effective way > > to make repair scale to arbitrary amounts of metadata. It > > certainly scales far better than developer time and testing > > resources... > > Ok. > > I'm not saying that everyone should now start adding "on disk" db > for xfs_repair. I just think that that soulution would work, > regardless of hardware and would make it possible to repair huge > filesystems (with tons of metadata) even on low memory machines > (without having to change hardware). It's always been the case that you can create a filesystem that a specific machine does not have the resouces to be able to repair. We can't prevent that from occurring. e.g. no amount of on-disk database work will make repair complete on an embedded NAS box with 512MB of RAM, a 2GB system disk with a filesystem that spans 2x4TB drives.... > If there is interest among developers to implement this (obiously > not) is another matter and shouldn't matter on discussing > approach. > > What is more interesting for me is talking about possible problems > with on disk approach and not looking for a solution to my > particular case. The problem with adding a database interface is that we have to re-engineer all the internal structures that xfs_repair uses and the indexes we use to track them. They need to be abstracted in a data base friendly manner, and then new code has to be writen to manage the database and insert/modify/remove the information in the database. Then there is work to find the most suitable database, as simple key/value pair databases won't scale to tracking hundreds of millions of records. That is likely to create significant dependencies for xfsprogs, of which we can't pull into things like the debian udeb builds which are used for building the recovery disk images that contain xfs_repair. So we have to make it all build time conditional, and then we'll have different capabilities from xfs-repair depending on where you run it from. Then we've got to test it all, document it, etc. And there's still no guarantee that is solves your problem. Not enough disk space for the database? ENOSPC causes failure instead of ENOMEM. How do we know how much disk space is needed? We can't predict that exactly without running repair, same as for memory usage prediction. And even if we are using a DB rather than RAM, there's still the possibility of ENOMEM. These are all solvable issues, but they take time and resources and expertise we don't currently have to solve. When compared to the simplicity of "add a usb SSD for swap", it just doesn't make sense to spend time trying to solve this problem.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs