Hallo Jason,thanks again for your time and apologies for long silence but I was busy upgrading to Luminous and converting Filestore->Bluestore.
In the meantime, the staging cluster where I was making tests was both upgraded to Ceph-Luminous and upgraded to OpenStack-Pike: good news is that now fstrim works as expected so I think it's not worth it (and difficult/impossible) to investigate further. I may post some more info once I have a maintenance window to upgrade the production cluster (I have to touch nova.conf, and I want to do that during a maintenance).
By the way, I am unable to configure Ceph such that the admin socket is made available on the (pure) client node, am going to open a separate issue for this.
Thanks! Fulvio -------- Original Message -------- Subject: Re: Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman <jdillama@xxxxxxxxxx> To: Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> CC: Ceph Users <ceph-users@xxxxxxxxxxxxxx> Date: 03/15/2018 01:35 PM
OK, last suggestion just to narrow the issue down: ensure you have a functional admin socket and librbd log file as documented here [1]. With the VM running, before you execute "fstrim", run "ceph --admin-daemon /path/to/the/asok/file conf set debug_rbd 20" on the hypervisor host, execute "fstrim" within the VM, and then restore the log settings via "ceph --admin-daemon /path/to/the/asok/file conf set debug_rbd 0/5". Grep the log file for "aio_discard" to verify if QEMU is passing the discard down to librbd. [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/ On Thu, Mar 15, 2018 at 6:53 AM, Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:Hallo Jason, I am really thankful for your time! Changed the volume features: rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': ..... features: layering, exclusive-lock, deep-flatten I had to create several dummy files before seeing and increase with "rbd du": to me, this is sort of indication that dirty blocks are, at least, reused if not properly released. Then I did "rm * ; sync ; fstrim / ; sync" but the size did not go down. Is there a way to instruct Ceph to perform what is not currently happening automatically (namely, scan the object-map of a volume and force cleanup of released blocks)? Or the problem is exactly that such blocks are not seen by Ceph as reusable? By the way, I think I forgot to mention that underlying OSD disks are taken from a FibreChannel storage (DELL MD3860, which is not capable of presenting JBOD so I present single disks as RAID0) and XFS formatted. Thanks! Fulvio -------- Original Message -------- Subject: Re: Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman <jdillama@xxxxxxxxxx> To: Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> CC: Ceph Users <ceph-users@xxxxxxxxxxxxxx> Date: 03/14/2018 02:10 PMHmm -- perhaps as an experiment, can you disable the object-map and fast-diff features to see if they are incorrectly reporting the object as in-use after a discard? $ rbd --cluster cephpa1 -p cinder-ceph feature disable volume-80838a69-e544-47eb-b981-a4786be89736 object-map,fast-diff On Wed, Mar 14, 2018 at 3:29 AM, Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:Hallo Jason, sure here it is! rbd --cluster cephpa1 -p cinder-ceph info volume-80838a69-e544-47eb-b981-a4786be89736 rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': size 15360 MB in 3840 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.9e7ffe238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: Thanks Fulvio -------- Original Message -------- Subject: Re: Issue with fstrim and Nova hw_disk_discard=unmap From: Jason Dillaman <jdillama@xxxxxxxxxx> To: Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> CC: Ceph Users <ceph-users@xxxxxxxxxxxxxx> Date: 03/13/2018 06:33 PMCan you provide the output from "rbd info <pool name>/volume-80838a69-e544-47eb-b981-a4786be89736"? On Tue, Mar 13, 2018 at 12:30 PM, Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:Hallo!Discards appear like they are being sent to the device. How big of a temporary file did you create and then delete? Did you sync the file to disk before deleting it? What version of qemu-kvm are you running?I made several test with commands like (issuing sync after each operation): dd if=/dev/zero of=/tmp/fileTest bs=1M count=200 oflag=direct What I see is that if I repeat the command with count<=200 the size does not increase. Let's try now with count>200: NAME PROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2284M dd if=/dev/zero of=/tmp/fileTest bs=1M count=750 oflag=direct dd if=/dev/zero of=/tmp/fileTest2 bs=1M count=750 oflag=direct sync NAME PROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M rm /tmp/fileTest* sync sudo fstrim -v / /: 14.1 GiB (15145271296 bytes) trimmed NAME PROVISIONED USED volume-80838a69-e544-47eb-b981-a4786be89736 15360M 2528M As for qemu-kvm, the guest OS is CentOS7, with: [centos@testcentos-deco3 tmp]$ rpm -qa | grep qemu qemu-guest-agent-2.8.0-2.el7.x86_64 while the host is Ubuntu 16 with: root@pa1-r2-s10:/home/ubuntu# dpkg -l | grep qemu ii qemu-block-extra:amd64 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64 extra block backend modules for qemu-system and qemu-utils ii qemu-kvm 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64 QEMU Full virtualization ii qemu-system-common 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64 QEMU full system emulation binaries (common files) ii qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64 QEMU full system emulation binaries (x86) ii qemu-utils 1:2.8+dfsg-3ubuntu2.9~cloud1 amd64 QEMU utilities Thanks! Fulvio
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com