Re: [ovirt-users] Replicated Glusterfs on top of ZFS

Ramesh Nachimuthu <rnachimu@xxxxxxxxxx> · Sun, 5 Mar 2017 23:26:26 -0500 (EST)



+gluster-users 


Regards,
Ramesh

----- Original Message -----
> From: "Arman Khalatyan" <arm2arm@xxxxxxxxx>
> To: "Juan Pablo" <pablo.localhost@xxxxxxxxx>
> Cc: "users" <users@xxxxxxxxx>, "FERNANDO FREDIANI" <fernando.frediani@xxxxxxx>
> Sent: Friday, March 3, 2017 8:32:31 PM
> Subject: Re: [ovirt-users] Replicated Glusterfs on top of ZFS
> 
> The problem itself is not the streaming data performance., and also dd zero
> does not help much in the production zfs running with compression.
> the main problem comes when the gluster is starting to do something with
> that, it is using xattrs, probably accessing extended attributes inside the
> zfs is slower than XFS.
> Also primitive find file or ls -l in the (dot)gluster folders takes ages:
> 
> now I can see that arbiter host has almost 100% cache miss during the
> rebuild, which is actually natural while he is reading always the new
> datasets:
> [root@clei26 ~]# arcstat.py 1
> time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
> 15:57:31 29 29 100 29 100 0 0 29 100 685M 31G
> 15:57:32 530 476 89 476 89 0 0 457 89 685M 31G
> 15:57:33 480 467 97 467 97 0 0 463 97 685M 31G
> 15:57:34 452 443 98 443 98 0 0 435 97 685M 31G
> 15:57:35 582 547 93 547 93 0 0 536 94 685M 31G
> 15:57:36 439 417 94 417 94 0 0 393 94 685M 31G
> 15:57:38 435 392 90 392 90 0 0 374 89 685M 31G
> 15:57:39 364 352 96 352 96 0 0 352 96 685M 31G
> 15:57:40 408 375 91 375 91 0 0 360 91 685M 31G
> 15:57:41 552 539 97 539 97 0 0 539 97 685M 31G
> 
> It looks like we cannot have in the same system performance and reliability
> :(
> Simply final conclusion is with the single disk+ssd even zfs doesnot help to
> speedup the glusterfs healing.
> I will stop here:)
> 
> 
> 
> 
> On Fri, Mar 3, 2017 at 3:35 PM, Juan Pablo < pablo.localhost@xxxxxxxxx >
> wrote:
> 
> 
> 
> cd to inside the pool path
> then dd if=/dev/zero of= test.tt bs=1M
> leave it runing 5/10 minutes.
> do ctrl+c paste result here.
> etc.
> 
> 2017-03-03 11:30 GMT-03:00 Arman Khalatyan < arm2arm@xxxxxxxxx > :
> 
> 
> 
> No, I have one pool made of the one disk and ssd as a cache and log device.
> I have 3 Glusterfs bricks- separate 3 hosts:Volume type Replicate (Arbiter)=
> replica 2+1!
> That how much you can push into compute nodes(they have only 3 disk slots).
> 
> 
> On Fri, Mar 3, 2017 at 3:19 PM, Juan Pablo < pablo.localhost@xxxxxxxxx >
> wrote:
> 
> 
> 
> ok, you have 3 pools, zclei22, logs and cache, thats wrong. you should have 1
> pool, with zlog+cache if you are looking for performance.
> also, dont mix drives.
> whats the performance issue you are facing?
> 
> 
> regards,
> 
> 2017-03-03 11:00 GMT-03:00 Arman Khalatyan < arm2arm@xxxxxxxxx > :
> 
> 
> 
> This is CentOS 7.3 ZoL version 0.6.5.9-1
> 
> 
> 
> 
> 
> [root@clei22 ~]# lsscsi
> 
> [2:0:0:0] disk ATA INTEL SSDSC2CW24 400i /dev/sda
> 
> [3:0:0:0] disk ATA HGST HUS724040AL AA70 /dev/sdb
> 
> [4:0:0:0] disk ATA WDC WD2002FYPS-0 1G01 /dev/sdc
> 
> 
> 
> 
> [root@clei22 ~]# pvs ;vgs;lvs
> 
> PV VG Fmt Attr PSize PFree
> 
> /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache lvm2 a-- 223.57g
> 0
> 
> /dev/sdc2 centos_clei22 lvm2 a-- 1.82t 64.00m
> 
> VG #PV #LV #SN Attr VSize VFree
> 
> centos_clei22 1 3 0 wz--n- 1.82t 64.00m
> 
> vg_cache 1 2 0 wz--n- 223.57g 0
> 
> LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
> 
> home centos_clei22 -wi-ao---- 1.74t
> 
> root centos_clei22 -wi-ao---- 50.00g
> 
> swap centos_clei22 -wi-ao---- 31.44g
> 
> lv_cache vg_cache -wi-ao---- 213.57g
> 
> lv_slog vg_cache -wi-ao---- 10.00g
> 
> 
> 
> 
> [root@clei22 ~]# zpool status -v
> 
> pool: zclei22
> 
> state: ONLINE
> 
> scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07 2017
> 
> config:
> 
> 
> 
> 
> NAME STATE READ WRITE CKSUM
> 
> zclei22 ONLINE 0 0 0
> 
> HGST_HUS724040ALA640_PN2334PBJ4SV6T1 ONLINE 0 0 0
> 
> logs
> 
> lv_slog ONLINE 0 0 0
> 
> cache
> 
> lv_cache ONLINE 0 0 0
> 
> 
> 
> 
> errors: No known data errors
> 
> 
> ZFS config:
> 
> 
> 
> [root@clei22 ~]# zfs get all zclei22/01
> 
> NAME PROPERTY VALUE SOURCE
> 
> zclei22/01 type filesystem -
> 
> zclei22/01 creation Tue Feb 28 14:06 2017 -
> 
> zclei22/01 used 389G -
> 
> zclei22/01 available 3.13T -
> 
> zclei22/01 referenced 389G -
> 
> zclei22/01 compressratio 1.01x -
> 
> zclei22/01 mounted yes -
> 
> zclei22/01 quota none default
> 
> zclei22/01 reservation none default
> 
> zclei22/01 recordsize 128K local
> 
> zclei22/01 mountpoint /zclei22/01 default
> 
> zclei22/01 sharenfs off default
> 
> zclei22/01 checksum on default
> 
> zclei22/01 compression off local
> 
> zclei22/01 atime on default
> 
> zclei22/01 devices on default
> 
> zclei22/01 exec on default
> 
> zclei22/01 setuid on default
> 
> zclei22/01 readonly off default
> 
> zclei22/01 zoned off default
> 
> zclei22/01 snapdir hidden default
> 
> zclei22/01 aclinherit restricted default
> 
> zclei22/01 canmount on default
> 
> zclei22/01 xattr sa local
> 
> zclei22/01 copies 1 default
> 
> zclei22/01 version 5 -
> 
> zclei22/01 utf8only off -
> 
> zclei22/01 normalization none -
> 
> zclei22/01 casesensitivity sensitive -
> 
> zclei22/01 vscan off default
> 
> zclei22/01 nbmand off default
> 
> zclei22/01 sharesmb off default
> 
> zclei22/01 refquota none default
> 
> zclei22/01 refreservation none default
> 
> zclei22/01 primarycache metadata local
> 
> zclei22/01 secondarycache metadata local
> 
> zclei22/01 usedbysnapshots 0 -
> 
> zclei22/01 usedbydataset 389G -
> 
> zclei22/01 usedbychildren 0 -
> 
> zclei22/01 usedbyrefreservation 0 -
> 
> zclei22/01 logbias latency default
> 
> zclei22/01 dedup off default
> 
> zclei22/01 mlslabel none default
> 
> zclei22/01 sync disabled local
> 
> zclei22/01 refcompressratio 1.01x -
> 
> zclei22/01 written 389G -
> 
> zclei22/01 logicalused 396G -
> 
> zclei22/01 logicalreferenced 396G -
> 
> zclei22/01 filesystem_limit none default
> 
> zclei22/01 snapshot_limit none default
> 
> zclei22/01 filesystem_count none default
> 
> zclei22/01 snapshot_count none default
> 
> zclei22/01 snapdev hidden default
> 
> zclei22/01 acltype off default
> 
> zclei22/01 context none default
> 
> zclei22/01 fscontext none default
> 
> zclei22/01 defcontext none default
> 
> zclei22/01 rootcontext none default
> 
> zclei22/01 relatime off default
> 
> zclei22/01 redundant_metadata all default
> 
> zclei22/01 overlay off default
> 
> 
> 
> 
> 
> On Fri, Mar 3, 2017 at 2:52 PM, Juan Pablo < pablo.localhost@xxxxxxxxx >
> wrote:
> 
> 
> 
> Which operating system version are you using for your zfs storage?
> do:
> zfs get all your-pool-name
> use arc_summary.py from freenas git repo if you wish.
> 
> 
> 2017-03-03 10:33 GMT-03:00 Arman Khalatyan < arm2arm@xxxxxxxxx > :
> 
> 
> 
> Pool load:
> [root@clei21 ~]# zpool iostat -v 1
> capacity operations bandwidth
> pool alloc free read write read write
> -------------------------------------- ----- ----- ----- ----- ----- -----
> zclei21 10.1G 3.62T 0 112 823 8.82M
> HGST_HUS724040ALA640_PN2334PBJ52XWT1 10.1G 3.62T 0 46 626 4.40M
> logs - - - - - -
> lv_slog 225M 9.72G 0 66 198 4.45M
> cache - - - - - -
> lv_cache 9.81G 204G 0 46 56 4.13M
> -------------------------------------- ----- ----- ----- ----- ----- -----
> 
> capacity operations bandwidth
> pool alloc free read write read write
> -------------------------------------- ----- ----- ----- ----- ----- -----
> zclei21 10.1G 3.62T 0 191 0 12.8M
> HGST_HUS724040ALA640_PN2334PBJ52XWT1 10.1G 3.62T 0 0 0 0
> logs - - - - - -
> lv_slog 225M 9.72G 0 191 0 12.8M
> cache - - - - - -
> lv_cache 9.83G 204G 0 218 0 20.0M
> -------------------------------------- ----- ----- ----- ----- ----- -----
> 
> capacity operations bandwidth
> pool alloc free read write read write
> -------------------------------------- ----- ----- ----- ----- ----- -----
> zclei21 10.1G 3.62T 0 191 0 12.7M
> HGST_HUS724040ALA640_PN2334PBJ52XWT1 10.1G 3.62T 0 0 0 0
> logs - - - - - -
> lv_slog 225M 9.72G 0 191 0 12.7M
> cache - - - - - -
> lv_cache 9.83G 204G 0 72 0 7.68M
> -------------------------------------- ----- ----- ----- ----- ----- -----
> 
> 
> On Fri, Mar 3, 2017 at 2:32 PM, Arman Khalatyan < arm2arm@xxxxxxxxx > wrote:
> 
> 
> 
> Glusterfs now in healing mode:
> Receiver:
> [root@clei21 ~]# arcstat.py 1
> time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
> 13:24:49 0 0 0 0 0 0 0 0 0 4.6G 31G
> 13:24:50 154 80 51 80 51 0 0 80 51 4.6G 31G
> 13:24:51 179 62 34 62 34 0 0 62 42 4.6G 31G
> 13:24:52 148 68 45 68 45 0 0 68 45 4.6G 31G
> 13:24:53 140 64 45 64 45 0 0 64 45 4.6G 31G
> 13:24:54 124 48 38 48 38 0 0 48 38 4.6G 31G
> 13:24:55 157 80 50 80 50 0 0 80 50 4.7G 31G
> 13:24:56 202 68 33 68 33 0 0 68 41 4.7G 31G
> 13:24:57 127 54 42 54 42 0 0 54 42 4.7G 31G
> 13:24:58 126 50 39 50 39 0 0 50 39 4.7G 31G
> 13:24:59 116 40 34 40 34 0 0 40 34 4.7G 31G
> 
> 
> Sender
> [root@clei22 ~]# arcstat.py 1
> time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
> 13:28:37 8 2 25 2 25 0 0 2 25 468M 31G
> 13:28:38 1.2K 727 62 727 62 0 0 525 54 469M 31G
> 13:28:39 815 508 62 508 62 0 0 376 55 469M 31G
> 13:28:40 994 624 62 624 62 0 0 450 54 469M 31G
> 13:28:41 783 456 58 456 58 0 0 338 50 470M 31G
> 13:28:42 916 541 59 541 59 0 0 390 50 470M 31G
> 13:28:43 768 437 56 437 57 0 0 313 48 471M 31G
> 13:28:44 877 534 60 534 60 0 0 393 53 470M 31G
> 13:28:45 957 630 65 630 65 0 0 450 57 470M 31G
> 13:28:46 819 479 58 479 58 0 0 357 51 471M 31G
> 
> 
> On Thu, Mar 2, 2017 at 7:18 PM, Juan Pablo < pablo.localhost@xxxxxxxxx >
> wrote:
> 
> 
> 
> hey,
> what are you using for zfs? get an arc status and show please
> 
> 
> 2017-03-02 9:57 GMT-03:00 Arman Khalatyan < arm2arm@xxxxxxxxx > :
> 
> 
> 
> no,
> ZFS itself is not on top of lvm. only ssd was spitted by lvm for slog(10G)
> and cache (the rest)
> but in any-case the ssd does not help much on glusterfs/ovirt load it has
> almost 100% cache misses....:( (terrible performance compare with nfs)
> 
> 
> 
> 
> 
> On Thu, Mar 2, 2017 at 1:47 PM, FERNANDO FREDIANI < fernando.frediani@xxxxxxx
> > wrote:
> 
> 
> 
> 
> 
> Am I understanding correctly, but you have Gluster on the top of ZFS which is
> on the top of LVM ? If so, why the usage of LVM was necessary ? I have ZFS
> with any need of LVM.
> 
> Fernando
> 
> On 02/03/2017 06:19, Arman Khalatyan wrote:
> 
> 
> 
> Hi,
> I use 3 nodes with zfs and glusterfs.
> Are there any suggestions to optimize it?
> 
> host zfs config 4TB-HDD+250GB-SSD:
> [root@clei22 ~]# zpool status
> pool: zclei22
> state: ONLINE
> scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07 2017
> config:
> 
> NAME STATE READ WRITE CKSUM
> zclei22 ONLINE 0 0 0
> HGST_HUS724040ALA640_PN2334PBJ4SV6T1 ONLINE 0 0 0
> logs
> lv_slog ONLINE 0 0 0
> cache
> lv_cache ONLINE 0 0 0
> 
> errors: No known data errors
> 
> Name:
> GluReplica
> Volume ID:
> ee686dfe-203a-4caa-a691-26353460cc48
> Volume Type:
> Replicate (Arbiter)
> Replica Count:
> 2 + 1
> Number of Bricks:
> 3
> Transport Types:
> TCP, RDMA
> Maximum no of snapshots:
> 256
> Capacity:
> 3.51 TiB total, 190.56 GiB used, 3.33 TiB free
> 
> 
> _______________________________________________
> Users mailing list Users@xxxxxxxxx
> http://lists.ovirt.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> Users mailing list
> Users@xxxxxxxxx
> http://lists.ovirt.org/mailman/listinfo/users
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users@xxxxxxxxx
> http://lists.ovirt.org/mailman/listinfo/users
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users@xxxxxxxxx
> http://lists.ovirt.org/mailman/listinfo/users
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users