Hi Ilya, > > >ISTR there were some anti-spam measures put in place. Is your account >waiting for manual approval? If so, David should be able to help. Yes if I remember correctly I get waiting approval when I try to log in. >> >> >> >> Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9287 >> ffff911a9a26bd00 fail -12 >> Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9283 > > >It is failing to allocate memory. "low load" isn't very specific, >can you describe the setup and the workload in more detail? 4 nodes (osd, mon combined), the 4th node has local cephfs mount, which is rsync'ing some files from vm's. 'low load' I have sort of test setup, going to production. Mostly the nodes are below a load of 1 (except when the concurrent rsync starts) >How many snapshots do you have? Don't know how to count them. I have script running on a 2000 dirs. If one of these dirs is not empty it creates a snapshot. So in theory I could have 2000 x 7 days = 14000 snapshots. (btw the cephfs snapshots are in a different tree than the rsync is using) >Do you keep track of memory consumption on the node? A bit, attached is nagios graph. I have 100GB in this node. Since then, I disabled all the hugepages (2MB, 1GB) I created there, to free up more memory. >Finally, you say "crash" in the subject. Does the kernel actually >crash or perhaps it locks up? If it actually crashes, do you have the >panic message? > Whole server was gone. The logs are from the remote syslog server. New situation is with more memory and kernel updated to 3.10.0-1062.4.3.el7.x86_64, rsync is very slow and I have kworker 100% load
Attachment:
c04-memory.png
Description: Binary data
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx