Re: I/O stalls when doing fstrim on large RBD

"Brendan Moloney" <moloney@xxxxxxxx> · Mon, 27 Nov 2017 22:56:03 +0000

Hi,

Anyone have input on this?  I am surprised there are not more people running into this issue.  I guess most people don't have multi-TB RBD images?  I think ext4 might also fair better since it does keep track of blocks that have been discarded in the past and not modified so that they don't get discarded again.

For the benefit of anyone else on the list who is following along, I went ahead and made the blktrace output available: https://filebin.ca/3imcZ5IHImxW/fstrim_blktrace.tar.gz

Here is the script I wrote to chunk up the fstrim: https://gist.github.com/moloney/5763a02e3847a5368af56110cc583544

Using this script and the "cfq" scheduler I do seem to have things running a bit more smoothly. I also raised the disk timeout values from 30 seconds to 180.  I am still not convinced that the issue is resolved though, I will need to wait and see what happens when a VM under heavy load does an fstrim.

Thanks,
Brendan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com