I’ve recently run into an issue with the RBD kernel client in emperor where I’m mapping and formatting an image, then repeatedly mounting it, writing data to it, unmounting it, and snapshotting it. Nearly every time (with only one exception so far), the driver appears to deadlock after the eighth snapshot and the device becomes completely unresponsive until I reboot. The one exception deadlocked after the ninth snapshot. I have reproduced this with and without partitions on the device, using NTFS, ext4, and xfs as the filesystem, and using a variety of applications to write files to the device.
I had been running similar tests previously on a dumpling system without issues, so I’m wondering if anyone has seen anything like this with emperor. There are other variables, so I’m not 100% sure it’s an emperor issue, but that appears to be the case from what I have seen.
I see there is an open issue #1769 where the kernel client can deadlock, but in my case the kernel client is a server machine with 8GB of memory, and memory utilization is not anywhere near capacity, so I don’t think it’s the same issue.
Performing the same set of operations via librbd (not using the kernel client) doesn’t seem to exhibit the deadlock.
Any ideas?
Steve
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com