Re: Issue with fstrim and Nova hw_disk_discard=unmap

Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> · Mon, 9 Apr 2018 14:00:32 +0200

Hallo Jason,
    thanks again for your time and apologies for long silence but I was 
busy upgrading to Luminous and converting Filestore->Bluestore.

  In the meantime, the staging cluster where I was making tests was 
both upgraded to Ceph-Luminous and upgraded to OpenStack-Pike: good news 
is that now fstrim works as expected so I think it's not worth it (and 
difficult/impossible) to investigate further.
I may post some more info once I have a maintenance window to upgrade 
the production cluster (I have to touch nova.conf, and I want to do that 
during a maintenance).

  By the way, I am unable to configure Ceph such that the admin socket 
is made available on the (pure) client node, am going to open a separate 
issue for this.

  Thanks!

			Fulvio

-------- Original Message --------
Subject: Re:  Issue with fstrim and Nova hw_disk_discard=unmap
From: Jason Dillaman <jdillama@xxxxxxxxxx>
To: Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx>
CC: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
Date: 03/15/2018 01:35 PM

OK, last suggestion just to narrow the issue down: ensure you have a
functional admin socket and librbd log file as documented here [1].
With the VM running, before you execute "fstrim", run "ceph
--admin-daemon /path/to/the/asok/file conf set debug_rbd 20" on the
hypervisor host, execute "fstrim" within the VM, and then restore the
log settings via "ceph --admin-daemon /path/to/the/asok/file conf set
debug_rbd 0/5".  Grep the log file for "aio_discard" to verify if QEMU
is passing the discard down to librbd.

[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/

On Thu, Mar 15, 2018 at 6:53 AM, Fulvio Galeazzi
<fulvio.galeazzi@xxxxxxx> wrote:
Hallo Jason, I am really thankful for your time!

   Changed the volume features:

rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736':
.....
         features: layering, exclusive-lock, deep-flatten

I had to create several dummy files before seeing and increase with "rbd
du": to me, this is sort of indication that dirty blocks are, at least,
reused if not properly released.

   Then I did "rm * ; sync ; fstrim / ; sync" but the size did not go down.
   Is there a way to instruct Ceph to perform what is not currently happening
automatically (namely, scan the object-map of a volume and force cleanup of
released blocks)? Or the problem is exactly that such blocks are not seen by
Ceph as reusable?

   By the way, I think I forgot to mention that underlying OSD disks are
taken from a FibreChannel storage (DELL MD3860, which is not capable of
presenting JBOD so I present single disks as RAID0) and XFS formatted.

   Thanks!

                         Fulvio

-------- Original Message --------
Subject: Re:  Issue with fstrim and Nova hw_disk_discard=unmap
From: Jason Dillaman <jdillama@xxxxxxxxxx>
To: Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx>
CC: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
Date: 03/14/2018 02:10 PM

Hmm -- perhaps as an experiment, can you disable the object-map and
fast-diff features to see if they are incorrectly reporting the object
as in-use after a discard?

$ rbd --cluster cephpa1 -p cinder-ceph feature disable
volume-80838a69-e544-47eb-b981-a4786be89736 object-map,fast-diff

On Wed, Mar 14, 2018 at 3:29 AM, Fulvio Galeazzi
<fulvio.galeazzi@xxxxxxx> wrote:

Hallo Jason, sure here it is!

rbd --cluster cephpa1 -p cinder-ceph info
volume-80838a69-e544-47eb-b981-a4786be89736
rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736':
          size 15360 MB in 3840 objects
          order 22 (4096 kB objects)
          block_name_prefix: rbd_data.9e7ffe238e1f29
          format: 2
          features: layering, exclusive-lock, object-map, fast-diff,
deep-flatten
          flags:

    Thanks

                  Fulvio

-------- Original Message --------
Subject: Re:  Issue with fstrim and Nova
hw_disk_discard=unmap
From: Jason Dillaman <jdillama@xxxxxxxxxx>
To: Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx>
CC: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
Date: 03/13/2018 06:33 PM

Can you provide the output from "rbd info <pool
name>/volume-80838a69-e544-47eb-b981-a4786be89736"?

On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi
<fulvio.galeazzi@xxxxxxx> wrote:

Hallo!

Discards appear like they are being sent to the device.  How big of a
temporary file did you create and then delete? Did you sync the file
to disk before deleting it? What version of qemu-kvm are you running?

I made several test with commands like (issuing sync after each
operation):

dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct

What I see is that if I repeat the command with count<=200 the size
does
not
increase.

Let's try now with count>200:

NAME                                        PROVISIONED  USED
volume-80838a69-e544-47eb-b981-a4786be89736      15360M 2284M

dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct
dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct
sync

NAME                                        PROVISIONED  USED
volume-80838a69-e544-47eb-b981-a4786be89736      15360M 2528M

rm /tmp/fileTest*
sync
sudo fstrim -v /
/: 14.1 GiB (15145271296 bytes) trimmed

NAME                                        PROVISIONED  USED
volume-80838a69-e544-47eb-b981-a4786be89736      15360M 2528M

As for qemu-kvm, the guest OS is CentOS7, with:

[centos@testcentos-deco3 tmp]$ rpm -qa | grep qemu
qemu-guest-agent-2.8.0-2.el7.x86_64

while the host is Ubuntu 16 with:

root@pa1-r2-s10:/home/ubuntu# dpkg -l | grep qemu
ii  qemu-block-extra:amd64               1:2.8+dfsg-3ubuntu2.9~cloud1
amd64        extra block backend modules for qemu-system and qemu-utils
ii  qemu-kvm                             1:2.8+dfsg-3ubuntu2.9~cloud1
amd64        QEMU Full virtualization
ii  qemu-system-common                   1:2.8+dfsg-3ubuntu2.9~cloud1
amd64        QEMU full system emulation binaries (common files)
ii  qemu-system-x86                      1:2.8+dfsg-3ubuntu2.9~cloud1
amd64        QEMU full system emulation binaries (x86)
ii  qemu-utils                           1:2.8+dfsg-3ubuntu2.9~cloud1
amd64        QEMU utilities

     Thanks!

                           Fulvio

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com