Re: I/O stalls when doing fstrim on large RBD

Jason Dillaman <jdillama@xxxxxxxxxx> · Sat, 18 Nov 2017 08:08:37 -0500

Can you capture a blktrace while perform fstrim to record the discard
operations? A 1TB trim extent would cause a huge impact since it would
translate to approximately 262K IO requests to the OSDs (assuming 4MB
backing files).

On Fri, Nov 17, 2017 at 6:19 PM, Brendan Moloney <moloney@xxxxxxxx> wrote:
> Hi,
>
> I guess this isn't strictly about Ceph, but I feel like other folks here
> must have run into the same issues.
>
> I am trying to keep my thinly provisioned RBD volumes thin.  I use
> virtio-scsi to attach the RBD volumes to my VMs with the "discard=unmap"
> option. The RBD is formatted as XFS and some of them can be quite large
> (16TB+).  I have a cron job that runs "fstrim" commands twice a week in the
> evenings.
>
> The issue is that I see massive I/O stalls on the VM during the fstrim.  To
> the point where I am getting kernel panics from hung tasks and other
> timeouts.  I have tried a number of things to lessen the impact:
>
>     - Switching from deadline to CFQ (initially I thought this helped, but
> now I am not convinced)
>     - Running fstrim with "ionice -c idle" (this doesn't seem to make a
> difference)
>     - Chunking the fstrim with the offset/length options (helps reduce worst
> case, but I can't trim less than 1TB at a time and that can still cause a
> pause for several minutes)
>
> Is there anything else I can do to avoid this issue?
>
> Thanks,
> Brendan
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com