Re: ceph node crashed with these errors "kernel: ceph: build_snap_context" (maybe now it is urgent?)

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Mon, 2 Dec 2019 16:29:07 +0100

I can confirm that removing all the snapshots seems to resolve the 
problem. 

A - I would propose a redesign of something like that snapshots from 
below the mountpoint are only taken into account and not snapshots in 
the entire filesystem. That should fix a lot of issues

B - That reminds me about this mv command, that does not move data 
across different pools in the fs. I would like to see this. Because it 
is the logical thing to expect. 

 >
 >>  >
 >>  >ISTR there were some anti-spam measures put in place.  Is your 
account  >>  >waiting for manual approval?  If so, David should be able 
to help.
 >>
 >> Yes if I remember correctly I get waiting approval when I try to log 
in.
 >>
 >>  >>
 >>  >>
 >>  >>
 >>  >> Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9287  
>>  >> ffff911a9a26bd00 fail -12  >>  >> Dec 1 03:14:36 c04 kernel: 
ceph: build_snap_context 100020c9283  >>  >  >>  >  >>  >It is failing 
to allocate memory.  "low load" isn't very specific,  >>  >can you 
describe the setup and the workload in more detail?
 >>
 >> 4 nodes (osd, mon combined), the 4th node has local cephfs mount, 
which  >> is rsync'ing some files from vm's. 'low load' I have sort of 
test setup,  >> going to production. Mostly the nodes are below a load 
of 1 (except when  >> the concurrent rsync starts)  >>  >>  >How many 
snapshots do you have?
 >>
 >> Don't know how to count them. I have script running on a 2000 dirs. 
If
 >> one of these dirs is not empty it creates a snapshot. So in theory I 
 >> could have 2000 x 7 days = 14000 snapshots.
 >> (btw the cephfs snapshots are in a different tree than the rsync is  
>> using)  >  >Is there a reason you are snapshotting each directory 
individually  >instead of just snapshotting a common parent?

Yes because I am not sure the snapshot frequency on all folders is going 
to be the same.

 >If you have thousands of snapshots, you may eventually hit a different
 >bug:
 >
 >https://tracker.ceph.com/issues/21420
 >https://docs.ceph.com/docs/master/cephfs/experimental-features/#snapsh
ots
 >
 >Be aware that each set of 512 snapshots amplify your writes by 4K in  
>terms of network consumption.  With 14000 snapshots, a 4K write would  
>need to transfer ~109K worth of snapshot metadata to carry itself out.
 >

Also when I am not even writing to a tree with snapshots enabled? I am 
rsyncing to dir3

.
├── dir1
│   ├── dira
│   │   └── .snap
│   ├── dirb
│   ├── dirc
│   │   └── .snap
│   └── dird
│       └── .snap
├── dir2
└── dir3
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx