Thank you so much, I think we are close to build a stable storage solution according to your recommendations. Here's our rebalance log - please don't pay attention to error messages after 9AM - this is when we manually destroyed volume to recreate it for further testing. Also all remove-brick operations you could see in the log were executed manually when recreating volume.
On 5 February 2018 at 15:40, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:Hi,I see a lot of the following messages in the logs:[2018-02-04 03:22:01.544446] I [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing [2018-02-04 07:41:16.189349] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 122440868 [2018-02-04 07:41:16.244261] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 3615890: WRITE => -1 gfid=c73ca10f-e83e-42a9-9b0a-1 de4e12c6798 fd=0x7ffa3802a5f0 (Ошибка ввода/вывода) [2018-02-04 07:41:16.254503] W [fuse-bridge.c:1377:fuse_err_cbk] 0-glusterfs-fuse: 3615891: FLUSH() ERR => -1 (Ошибка ввода/вывода) The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 122440868" repeated 81 times between [2018-02-04 07:41:16.189349] and [2018-02-04 07:41:16.254480] [2018-02-04 10:50:27.624283] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 116958174 [2018-02-04 10:50:27.752107] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 3997764: WRITE => -1 gfid=18e2adee-ff52-414f-aa37-5 06cff1472ee fd=0x7ffa3801d7d0 (Ошибка ввода/вывода) [2018-02-04 10:50:27.762331] W [fuse-bridge.c:1377:fuse_err_cbk] 0-glusterfs-fuse: 3997765: FLUSH() ERR => -1 (Ошибка ввода/вывода) The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 116958174" repeated 147 times between [2018-02-04 10:50:27.624283] and [2018-02-04 10:50:27.762292] [2018-02-04 10:55:35.256018] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 28918667 [2018-02-04 10:55:35.387073] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 4006263: WRITE => -1 gfid=54e6f8ea-27d7-4e92-ae64-5 e198bd3cb42 fd=0x7ffa38036bf0 (Ошибка ввода/вывода) [2018-02-04 10:55:35.407554] W [fuse-bridge.c:1377:fuse_err_cbk] 0-glusterfs-fuse: 4006264: FLUSH() ERR => -1 (Ошибка ввода/вывода) [2018-02-04 10:55:59.677734] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 69319528 [2018-02-04 10:55:59.827012] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 4014645: WRITE => -1 gfid=ce700d9b-ef55-4e55-a371-9 642e90555cb fd=0x7ffa38036bf0 (Ошибка ввода/вывода) This is the reason for the I/O errors you are seeing. Gluster cannot find the subvolume for the file in question so it will fail the write with I/O error. It looks like some bricks may not have been up at the time the volume tried to get the layout.This is a problem as this is a pure distributed volume. For some reason the layout is not set on some bricks/some bricks are unreachable.There are a lot of graph changes in the logs - I would recommend against so many changes in such a short interval. There aren't logs for the interval before to find out why. Can you send me the rebalance logs from the nodes?To clarify, I see multiple graph changes in a few minutes. I would recommend adding/removing multiple bricks at a time when expanding/shrinking the volume instead of one at a time.>I case we have too much capacity that's not needed at the moment we are going to remove-brick and fix-layout again in order to shrink >storage.I do see the number of bricks reducing in the graphs.Are you sure a remove-brick has not been run? There is no need to run a fix-layout after using "remove-brick start" as that will automatically rebalance data.Regards,NithyaOn 5 February 2018 at 14:06, Nikita Yeryomin <nikyer@xxxxxxxxx> wrote:Attached the log. There are some errors in it like
[2018-02-04 18:50:41.112962] E [fuse-bridge.c:903:fuse_getattr_resume] 0-glusterfs-fuse: 9613852: GETATTR 140712792330896 (7d39d329-c0e0-4997-85e6-0e66e 0436315) resolution failed
But when it occurs it seems not affecting current file i/o operations.I've already re-created the volume yesterday and I was not able to reproduce the error during file download after that, but still there are errors in logs like above and system seems a bit unstable.
Let me share some more details on how we are trying to use glusterfs.
So it's distributed NOT replicated volume with sharding enabled.We have many small servers (20GB each) in a cloud and a need to work with rather large files (~300GB).We start volume with one 15GB brick which is a separate XFS partition on each server and then add bricks one by one to reach needed capacity.After each brick is added we do rebalance fix-layout.I case we have too much capacity that's not needed at the moment we are going to remove-brick and fix-layout again in order to shrink storage. But we have not yet been able to test removing bricks as system behaves not stable after scaling out.What I've found here https://bugzilla.redhat.com/show_bug.cgi?id=875076 - seems starting with one brick is not a good idea.. so we are going to try starting with 2 bricks.
Please let me know if there are anything else we should consider changing in our strategy.Many thanks in advance!Nikita Yeryomin2018-02-05 7:53 GMT+02:00 Nithya Balachandran <nbalacha@xxxxxxxxxx>:Hi,Please provide the log for the mount process from the node on which you have mounted the volume. This should be in /var/log/glusterfs and the name of the file will the the hyphenated path of the mount point. For e.g., If the volume in mounted at /mnt/glustervol, the log file will be /var/log/glusterfs/mnt-glusterfs.log Regards,NithyaOn 4 February 2018 at 21:09, Nikita Yeryomin <nikyer@xxxxxxxxx> wrote:______________________________Please help troubleshooting glusterfs with the following setup:
Distributed volume without replication. Sharding enabled.
# cat /etc/centos-release
CentOS release 6.9 (Final)
# glusterfs --version
glusterfs 3.12.3
[root@master-5f81bad0054a11e8b
f7d0671029ed6b8 uploads]# gluster volume info
Volume Name: gv0
Type: Distribute
Volume ID: 1a7e05f6-4aa8-48d3-b8e3-300637
031925 Status: Started
Snapshot Count: 0
Number of Bricks: 27
Transport-type: tcp
Bricks:
Brick1: gluster3.qencode.com:/var/stor
age/brick/gv0 Brick2: encoder-376cac0405f311e8847006
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick3: encoder-ee6761c0091c11e891ba06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick4: encoder-ee68b8ea091c11e89c2d06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick5: encoder-ee663700091c11e8b48f06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick6: encoder-efcf113e091c11e8995206
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick7: encoder-efcd5a24091c11e8963a06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick8: encoder-099f557e091d11e882f706
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick9: encoder-099bdda4091d11e8810906
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick10: encoder-099dca56091d11e8b34106
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick11: encoder-09a1ba4e091d11e8a3c206
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick12: encoder-099a826a091d11e8959406
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick13: encoder-0998aa8a091d11e8a81606
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick14: encoder-0b582724091d11e8b3b406
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick15: encoder-0dff527c091d11e896f206
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick16: encoder-0e0d5c14091d11e886cf06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick17: encoder-7f1bf3d4093b11e8a35806
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick18: encoder-7f70378c093b11e8852606
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick19: encoder-7f19528c093b11e88f1006
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick20: encoder-7f76c048093b11e8a74706
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick21: encoder-7f7fc90e093b11e8a74e06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick22: encoder-7f6bc382093b11e8b8a306
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick23: encoder-7f7b44d8093b11e8906f06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick24: encoder-7f72aa30093b11e89a8e06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick25: encoder-7f7d735c093b11e8b46506
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick26: encoder-7f1a5006093b11e89bcb06
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Brick27: encoder-95791076093b11e8af1706
71029ed6b8.qencode.com:/var/st orage/brick/gv0 Options Reconfigured:
cluster.min-free-disk: 10%
performance.cache-max-file-siz
e: 1048576 nfs.disable: on
transport.address-family: inet
features.shard: on
performance.client-io-threads: on
Each brick is 15Gb size.
After using volume for several hours with intensive read/write operations (~300GB written and then deleted) an attempt to write to volume results in an Input/Output error:
# wget https://speed.hetzner.de/1GB.b
in --2018-02-04 12:02:34-- https://speed.hetzner.de/1GB.b
in Resolving speed.hetzner.de... 88.198.248.254, 2a01:4f8:0:59ed::2
Connecting to speed.hetzner.de|88.198.248.25
4|:443... connected. HTTP request sent, awaiting response... 200 OK
Length: 1048576000 (1000M) [application/octet-stream]
Saving to: `1GB.bin'
38% [=============================
============================== ==> ] 403,619,518 27.8M/s in 15s
Cannot write to `1GB.bin' (Input/output error).
I don't see anything written to glusterd.log, or any other logs in /var/log/glusterfs/* when this error occurs.
Deleting partially downloaded file works without error.
Thanks,Nikita Yeryomin_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
<<attachment: gv0-rebalance.log.zip>>
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users