Re: Fwd: Troubleshooting glusterfs

Nithya Balachandran <nbalacha@xxxxxxxxxx> · Thu, 15 Feb 2018 13:33:04 +0530

Hi Nikita,
Sorry for taking so long to get back to you. I will take a look at the logs and get back.

Regards,
Nithya

On 7 February 2018 at 19:33, Nikita Yeryomin <nikyer@xxxxxxxxx> wrote:
Hello Nithya! Thank you for your help on figuring this out!We changed our configuration and after having a successful test yesterday we have run into new issue today.
The test including moderate read/write (~20-30 Mb/s) and scaling the storage was running about 3 hours and at some moment system got stuck:
On the user level there are such errors when trying to work with filesystem:

OSError: [Errno 2] No such file or directory: '/home/public/data/outputs/merged/c0a91c500be311e8846eb2f7a7fdd356-video_audio_merge-2/c0a91c500be311e8846eb2f7a7fdd356-vi
deo_join-2.mp4'

I've checked mnt log and seems there are issues with sharding:

[2018-02-07 11:52:36.200554] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 140-gv1-shard: Lookup on shard 1 failed. Base file gfid = b3a24312-c1fb-4fe0-b11c-0ca264233f62 [Stale file handle]

So this time we started a distributed not-replicated volume with 4 20Gb bricks. Per your advice to add more storage at a time we were adding 2 more 20Gb bricks each time storage total free  space was getting lower than a threshold value (in this test it was 70Gb in the beginning and than was changed to 150Gb). I can say if was ~50-60% used all the time.

When stopping the test the volume was looking like this:

Volume Name: gv1
Type: Distribute
Volume ID: fcdae350-cda9-4da3-bb70-63558ab11f56

Status: Started
Snapshot Count: 0
Number of Bricks: 22
Transport-type: tcp
Bricks:
Brick1: dev-gluster1.qencode.com:/var/storage/brick/gv1
Brick2: dev-gluster2.qencode.com:/var/storage/brick/gv1
Brick3: master-59e8248a0ac511e892e90671029ed6b8.qencode.com:/var/storage/brick2/gv1
Brick4: master-59e8248a0ac511e892e90671029ed6b8.qencode.com:/var/storage/brick1/gv1
Brick5: encoder-9fe7821c0b8011e8af7e0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick6: encoder-2d3a6d6a0be411e8a9470671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick7: encoder-2d3b4f960be411e88c7f0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick8: encoder-327b832c0be411e8b3a80671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick9: encoder-3272cd540be411e88f120671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick10: encoder-327890720be411e8ba570671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick11: encoder-327065d20be411e899620671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick12: encoder-327570540be411e898da0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick13: encoder-327e2a640be411e89fd40671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick14: encoder-328336080be411e8bbe70671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick15: encoder-3286494c0be411e88edb0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick16: encoder-45c894060be411e895e00671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick17: encoder-49565b6c0be411e8b47d0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick18: encoder-4b26e1c80be411e889ce0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick19: encoder-4b30f8200be411e8b9770671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick20: encoder-4b3b2f160be411e886ec0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick21: encoder-4b40827c0be411e89edd0671029ed6b8.qencode.com:/var/storage/brick/gv1
Brick22: encoder-4b956ec20be411e8ac900671029ed6b8.qencode.com:/var/storage/brick/gv1
Options Reconfigured:

nfs.disable: on
transport.address-family: inet
features.shard: on
cluster.min-free-disk: 10%
performance.cache-max-file-size: 1048576
performance.client-io-threads: on

The test started at ~8:50 AM server time.
Attaching mnt and rebalance logs. 

Looking forward for your advice!

Thanks,
Nikita

2018-02-05 14:32 GMT+02:00 Nikita Yeryomin <nikyer@xxxxxxxxx>:
Hello Nithya!
Thank you so much, I think we are close to build a stable storage solution according to your recommendations. Here's our rebalance log - please don't pay attention to error messages after 9AM - this is when we manually destroyed volume to recreate it for further testing. Also all remove-brick operations you could see in the log were executed manually when recreating volume.We are now changing our code to follow your advise and will do more testing.

Thanks,
Nikita 

2018-02-05 12:20 GMT+02:00 Nithya Balachandran <nbalacha@xxxxxxxxxx>:

On 5 February 2018 at 15:40, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
Hi,

I see a lot of the following messages in the logs:
[2018-02-04 03:22:01.544446] I [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing
[2018-02-04 07:41:16.189349] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 122440868
[2018-02-04 07:41:16.244261] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 3615890: WRITE => -1 gfid=c73ca10f-e83e-42a9-9b0a-1de4e12c6798 fd=0x7ffa3802a5f0 (Ошибка ввода/вывода)
[2018-02-04 07:41:16.254503] W [fuse-bridge.c:1377:fuse_err_cbk] 0-glusterfs-fuse: 3615891: FLUSH() ERR => -1 (Ошибка ввода/вывода)
The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 122440868" repeated 81 times between [2018-02-04 07:41:16.189349] and [2018-02-04 07:41:16.254480]
[2018-02-04 10:50:27.624283] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 116958174
[2018-02-04 10:50:27.752107] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 3997764: WRITE => -1 gfid=18e2adee-ff52-414f-aa37-506cff1472ee fd=0x7ffa3801d7d0 (Ошибка ввода/вывода)
[2018-02-04 10:50:27.762331] W [fuse-bridge.c:1377:fuse_err_cbk] 0-glusterfs-fuse: 3997765: FLUSH() ERR => -1 (Ошибка ввода/вывода)
The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 116958174" repeated 147 times between [2018-02-04 10:50:27.624283] and [2018-02-04 10:50:27.762292]
[2018-02-04 10:55:35.256018] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 28918667
[2018-02-04 10:55:35.387073] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 4006263: WRITE => -1 gfid=54e6f8ea-27d7-4e92-ae64-5e198bd3cb42 fd=0x7ffa38036bf0 (Ошибка ввода/вывода)
[2018-02-04 10:55:35.407554] W [fuse-bridge.c:1377:fuse_err_cbk] 0-glusterfs-fuse: 4006264: FLUSH() ERR => -1 (Ошибка ввода/вывода)
[2018-02-04 10:55:59.677734] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash (value) = 69319528
[2018-02-04 10:55:59.827012] W [fuse-bridge.c:2398:fuse_writev_cbk] 0-glusterfs-fuse: 4014645: WRITE => -1 gfid=ce700d9b-ef55-4e55-a371-9642e90555cb fd=0x7ffa38036bf0 (Ошибка ввода/вывода)

This is the reason for the I/O errors you are seeing. Gluster cannot find the subvolume for the file in question so it will fail the write with I/O error. It looks like some bricks may not have been up at the time the volume tried to get the layout.

This is a problem as this is a pure distributed volume. For some reason the layout is not set on some bricks/some bricks are unreachable. 

There are a lot of graph changes in the logs - I would recommend against so many changes in such a short interval. There aren't logs for the interval before to find out why. Can you send me the rebalance logs from the nodes?

To clarify, I see multiple graph changes in a few minutes. I would recommend adding/removing multiple bricks at a time when expanding/shrinking the volume instead of one at a time. 

>I case we have too much capacity that's not needed at the moment we are going to remove-brick and fix-layout again in order to shrink >storage.

I do see the number of bricks reducing in the graphs.Are you sure a remove-brick has not been run?  There is no need to run a fix-layout after using "remove-brick start" as that will automatically rebalance data.

Regards,
Nithya

On 5 February 2018 at 14:06, Nikita Yeryomin <nikyer@xxxxxxxxx> wrote:
Attached the log. There are some errors in it like
[2018-02-04 18:50:41.112962] E [fuse-bridge.c:903:fuse_getattr_resume] 0-glusterfs-fuse: 9613852: GETATTR 140712792330896 (7d39d329-c0e0-4997-85e6-0e66e0436315) resolution failed

But when it occurs it seems not affecting current file i/o operations.
I've already re-created the volume yesterday and I was not able to reproduce the error during file download after that, but still there are errors in logs like above and system seems a bit unstable.
Let me share some more details on how we are trying to use glusterfs.
So it's distributed NOT replicated volume with sharding enabled.
We have many small servers (20GB each) in a cloud and a need to work with rather large files (~300GB).
We start volume with one 15GB brick which is a separate XFS partition on each server and then add bricks one by one to reach needed capacity.
After each brick is added we do rebalance fix-layout. 
I case we have too much capacity that's not needed at the moment we are going to remove-brick and fix-layout again in order to shrink storage. But we have not yet been able to test removing bricks as system behaves not stable after scaling out.

What I've found here https://bugzilla.redhat.com/show_bug.cgi?id=875076 - seems starting with one brick is not a good idea.. so we are going to try starting with 2 bricks.
Please let me know if there are anything else we should consider changing in our strategy.

Many thanks in advance!
Nikita Yeryomin

2018-02-05 7:53 GMT+02:00 Nithya Balachandran <nbalacha@xxxxxxxxxx>:
Hi,
Please provide the log for the mount process from the node on which you have mounted the volume. This should be in /var/log/glusterfs and the name of the file will the the hyphenated path of the mount point. For e.g., If the volume in mounted at /mnt/glustervol, the log file will be /var/log/glusterfs/mnt-glusterfs.log

Regards,
Nithya

On 4 February 2018 at 21:09, Nikita Yeryomin <nikyer@xxxxxxxxx> wrote:
Please help troubleshooting glusterfs with the following setup:
Distributed volume without replication. Sharding enabled.

# cat /etc/centos-release
CentOS release 6.9 (Final)

# glusterfs --version
glusterfs 3.12.3

[root@master-5f81bad0054a11e8bf7d0671029ed6b8 uploads]# gluster volume info

Volume Name: gv0
Type: Distribute
Volume ID: 1a7e05f6-4aa8-48d3-b8e3-300637031925
Status: Started
Snapshot Count: 0
Number of Bricks: 27
Transport-type: tcp
Bricks:
Brick1: gluster3.qencode.com:/var/storage/brick/gv0
Brick2: encoder-376cac0405f311e884700671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick3: encoder-ee6761c0091c11e891ba0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick4: encoder-ee68b8ea091c11e89c2d0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick5: encoder-ee663700091c11e8b48f0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick6: encoder-efcf113e091c11e899520671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick7: encoder-efcd5a24091c11e8963a0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick8: encoder-099f557e091d11e882f70671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick9: encoder-099bdda4091d11e881090671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick10: encoder-099dca56091d11e8b3410671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick11: encoder-09a1ba4e091d11e8a3c20671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick12: encoder-099a826a091d11e895940671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick13: encoder-0998aa8a091d11e8a8160671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick14: encoder-0b582724091d11e8b3b40671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick15: encoder-0dff527c091d11e896f20671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick16: encoder-0e0d5c14091d11e886cf0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick17: encoder-7f1bf3d4093b11e8a3580671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick18: encoder-7f70378c093b11e885260671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick19: encoder-7f19528c093b11e88f100671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick20: encoder-7f76c048093b11e8a7470671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick21: encoder-7f7fc90e093b11e8a74e0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick22: encoder-7f6bc382093b11e8b8a30671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick23: encoder-7f7b44d8093b11e8906f0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick24: encoder-7f72aa30093b11e89a8e0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick25: encoder-7f7d735c093b11e8b4650671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick26: encoder-7f1a5006093b11e89bcb0671029ed6b8.qencode.com:/var/storage/brick/gv0
Brick27: encoder-95791076093b11e8af170671029ed6b8.qencode.com:/var/storage/brick/gv0
Options Reconfigured:
cluster.min-free-disk: 10%
performance.cache-max-file-size: 1048576
nfs.disable: on
transport.address-family: inet
features.shard: on
performance.client-io-threads: on

Each brick is 15Gb size.
After using volume for several hours with intensive read/write operations (~300GB written and then deleted) an attempt to write to volume results in an Input/Output error:

# wget https://speed.hetzner.de/1GB.bin
--2018-02-04 12:02:34--  https://speed.hetzner.de/1GB.bin
Resolving speed.hetzner.de... 88.198.248.254, 2a01:4f8:0:59ed::2
Connecting to speed.hetzner.de|88.198.248.254|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1048576000 (1000M) [application/octet-stream]
Saving to: `1GB.bin'

38% [=============================================================>                                                                                                     ] 403,619,518 27.8M/s   in 15s     

Cannot write to `1GB.bin' (Input/output error).

I don't see anything written to glusterd.log, or any other logs in /var/log/glusterfs/* when this error occurs.

Deleting partially downloaded file works without error.

Thanks,
Nikita Yeryomin

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users