Re: xfs: Assertion failed in xfs_ag_resv_init()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 02, 18:52, Greg Kroah-Hartman wrote
> On Thu, May 02, 2019 at 05:27:36PM +0200, Andre Noll wrote:
> > On Thu, May 02, 16:10, Greg Kroah-Hartman wrote
> > > Ok, then how about we hold off on this patch for 4.9.y then.  "no one"
> > > should be using 4.9.y in a "server system" anymore, unless you happen to
> > > have an enterprise kernel based on it.  So we should be fine as the
> > > users of the older kernels don't run xfs.
> > 
> > Well, we do run xfs on top of bcache on vanilla 4.9 kernels on a few
> > dozen production servers here. Mainly because we ran into all sorts
> > of issues with newer kernels (not necessary related to xfs). 4.9,
> > OTOH, appears to be rock solid for our workload.
> 
> Great, but what is wrong with 4.14.y or better yet, 4.19.y?  Do those
> also work for your workload?  If not, we should fix that, and soon :)

Some months ago we tried 4.14 and it was a real disaster: random
crashes with nothing in the logs on the file servers and unkillable
hung processes on the compute machines. The thing is, I can't afford
an extended downtime of these production systems, or test patches, or
enable debugging options which slow down the systems too much. Also,
10 of the compute nodes load the nvidia module, so all bets are off
anyway. But we've seen the hung processes also on the non-gpu nodes
where the nvidia module is not loaded.

As for 4.19, xfs on bcache was broken until a couple of weeks
ago. Meanwhile the fix (e578f90d8a9c) went in, so I benchmarked 4.19.x
on one system briefly. To my surprise the results were *worse* than
with 4.9. This seems to be another cache bypass issue, but I need to
have a closer look, and more reliable numbers.

> I would _STRONGLY_ recommend moving of of 4.9 on any non-SoC-based
> system at this point in time, there should not be any reason to stick
> with it, unless you are paying a company to provide support for it.

That's really bad news :(

Thanks for sharing your thoughts about the future of 4.9, though. I'll
try to spend some time on the bcache issue on 4.19.

Best
Andre
-- 
Max Planck Institute for Developmental Biology
Max-Planck-Ring 5, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829
http://people.tuebingen.mpg.de/maan/

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux