> On Dec 4, 2016, at 3:50 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sun, Dec 04, 2016 at 03:24:50PM -0800, Cyril Peponnet wrote: >>> On Dec 4, 2016, at 2:46 PM, Dave Chinner <david@xxxxxxxxxxxxx> >>> Which used LVM snapshots to take snapshots of the entire brick. >>> I don't see any LVM in your config, so I'm not sure what >>> snapshot implementation you are using here. What are you using >>> to take the snapshots of your VM image files? Are you actually >>> using the qemu qcow2 snapshot functionality rather than anything >>> native to gluster? >>> >> >> Yes sorry it was not clear enough, qemu-img snapshots no native >> snapshots. > > Ok, so that's a fragmentation problem in it's own right. both > internal qcow2 fragmentation and file fragmentation. > >>> Also, can you attach the 'xfs_bmap -vp' output of some of these >>> image files and their snapshots? >> >> A snapshot: >> https://gist.github.com/CyrilPeponnet/8108c74b9e8fd1d9edbf239b2872378d >> (let me know if you need more basically there is around 600 live >> snapshots sitting here). > > 1200 extents, mostly small, almost entirely adjacent. Typical qcow2 > file fragmentation pattern. That's not going to cause your memory > allocation problems - can you find one that has hundreds of > thousands of extents? I found one with 10799109 :/ 576GB in size (I need to find why this one is so big this is not normal…)… Could it lead to the issue? I mean could one file cause the deadlock of the entire FS? > >>> >>> 56GB of cached file data. If you're getting high order >>> allocation failures (which I suspect is the problem) then this >>> is a memory fragmentation problem more than anything. >>> >>>> ---------------------------------------------------------------- >>>> DG/VD TYPE State Access Consist Cache Cac sCC Size Name >>>> ---------------------------------------------------------------- >>>> 0/0 RAID0 Optl RW Yes RAWBC - ON 7.275 TB scratch >>>> ---------------------------------------------------------------- >>>> >>>> Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially >>>> Degraded|dgrd=Degraded Optl=Optimal|RO=Read Only|RW=Read >>>> Write|HD=Hidden|B=Blocked|Consist=Consistent| R=Read Ahead >>>> Always|NR=No Read Ahead|WB=WriteBack| AWB=Always >>>> WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled >>> >>> IIRC, AWB means that if the cache goes into degraded/offline >>> mode, you’re vulnerable to corruption/loss on power >>> failure… >> >> Yes we have BBU + redundant PSU to address that. > > BBU fails, data center loses power, corruption/data loss still > occurs. Not my problem, though. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html