Hello Mike, see my inline comments. Am 14.08.19 um 02:09 schrieb Mike
Christie:
----- Previous tests crashed in a reproducible manner with "-P 1" (single io gzip/gunzip) after a few minutes up to 45 minutes. Overview of my tests: - SUCCESSFUL: kernel 4.15, ceph 12.2.5, 1TB ec-volume, ext4 file system, 120s device timeout -> 18 hour testrun was successful, no dmesg output - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created without reboot -> parallel krbd device usage with 99% io usage worked without a problem while running the test - FAILED: kernel 4.15, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created -> parallel krbd device usage with 99% io usage worked without a problem while running the test - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, no timeout -> failed after < 10 minutes -> system runs in a high system load, system is almost unusable, unable to shutdown the system, hard reset of vm necessary, manual exclusive lock removal is necessary before remapping the device There is something new compared to yesterday.....three days ago i
downgraded a production system to client version 12.2.5. - FAILED: kernel 4.15, ceph 12.2.5, 2TB ec-volume, ext4 file system, 120s device timeout -> crashed in production while snapshot trimming is running on that pool - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created without reboot -> parallel krbd device usage with 99% io usage worked without a problem while running the test - FAILED: kernel 4.15, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created -> parallel krbd device usage with 99% io usage worked without a problem while running the test - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, no timeout -> failed after < 10 minutes -> system runs in a high system load, system is almost unusable, unable to shutdown the system, hard reset of vm necessary, manual exclusive lock removal is necessary before remapping the device - FAILED: kernel 4.4, ceph 12.2.11, 2TB 3-replica-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created - FAILED: kernel 5.0, ceph 12.2.12, 2TB ec-volume, ext4 file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created - FAILED: kernel 4.4, ceph 12.2.11, 2TB 3-replica-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-createdHow many CPUs and how much memory does the VM have? Charateristic of the crashed vm machine:
see attached file.I'm not sure which test it covers above, but for test-with-timeout/ceph-client.archiv.log and dmesg-crash it looks like the command that probably triggered the timeout got stuck in safe_write or write_fd, because we see: // Command completed and right after this log message we try to write the reply and data to the nbd.ko module. 2019-07-29 21:55:21.148118 7fffbf7fe700 20 rbd-nbd: writer_entry: got: [4500000000000000 READ 24043755000~20000 0] // We got stuck and 2 minutes go by and so the timeout fires. That kills the socket, so we get an error here and after that rbd-nbd is going to exit. 2019-07-29 21:57:21.785111 7fffbf7fe700 -1 rbd-nbd: [4500000000000000 READ 24043755000~20000 0]: failed to write replay data: (32) Broken pipe We could hit this in a couple ways: 1. The block layer sends a command that is larger than the socket's send buffer limits. These are those values you sometimes set in sysctl.conf like: net.core.rmem_max net.core.wmem_max net.core.rmem_default net.core.wmem_default net.core.optmem_max memory was definitely not low, we only had 10% memory usage at the time of the crash.There does not seem to be any checks/code to make sure there is some alignment with limits. I will send a patch but that will not help you right now. The max io size for nbd is 128k so make sure your net values are large enough. Increase the values in sysctl.conf and retry if they were too small.Not sure what I was thinking. Just checked the logs and we have done IO of the same size that got stuck and it was fine, so the socket sizes should be ok. We still need to add code to make sure IO sizes and the af_unix sockets size limits match up.2. If memory is low on the system, we could be stuck trying to allocate memory in the kernel in that code path too. rbd-nbd just uses more memory per device, so it could be why we do not see a problem with krbd. 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd. He removed that code from the krbd. I will ping him on that. Interesting. I activated Coredumps for that processes - probably
we can find something interesting here... Regards |
Attachment:
sysctl_settings.txt.gz
Description: application/gzip
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com