Re: gluster tiering errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex - Thank you for the response...
 
>>> There are several messages "no space left on device". I would check first that free disk space is available for the volume.

 The volumes appear to be fine with available space:

/dev/mapper/vg_bricks-brick_nvme1                1.4T  782G  652G  55% /mnt/brick_nvme1
/dev/mapper/vg_bricks-brick_nvme2                1.4T  742G  691G  52% /mnt/brick_nvme2

As mentioned, I'm new to Gluster.. Is this not where the "no space left on device" would be referring to?

Thanks again,

HB


Herb,
What are the high and low watermarks for the tier set at ?

# gluster volume get <vol> cluster.watermark-hi

# gluster volume get <vol> cluster.watermark-low

What is the size of the file that failed to migrate as per the following tierd log:

[2017-10-19 17:52:07.519614] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)

If possible, a gluster volume info would also help, instead of going to and fro with questions.

--
Milind



On Fri, Oct 20, 2017 at 12:42 AM, Herb Burnswell <herbert.burnswell@xxxxxxxxx> wrote:
All,

I am new to gluster and have some questions/concerns about some tiering errors that I see in the log files.

OS: CentOs 7.3.1611
Gluster version: 3.10.5
Samba version: 4.6.2

I see the following (scrubbed):

Node 1 /var/log/glusterfs/tier/<vol>/tierd.log:

[2017-10-19 17:52:07.519614] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
[2017-10-19 17:52:07.525110] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.526088] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.526111] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.527214] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.527244] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fb4411c4-a387-4e5f-a2b7-897633ef4aa8)
[2017-10-19 17:52:07.533510] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.534434] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.534453] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.535570] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.535594] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fba421e7-0500-47c4-bf67-10a40690e13d)
[2017-10-19 17:52:07.541363] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.542296] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.542357] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.543480] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.543521] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fe6799e1-42e6-43e5-a7eb-ac8facfcbc9f)
[2017-10-19 17:52:07.549959] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.550901] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.550922] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.551896] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.551917] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:ffe3a3f2-b170-43f0-a9fb-97c78e3173eb)
[2017-10-19 17:52:07.551945] E [MSGID: 109037] [tier.c:2565:tier_run] 0-<vol>-tier-dht: Promotion failed

Node 1 /var/log/samba/glusterfs-<vol>-pool.log:

[2017-10-18 17:13:41.481860] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.481860] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.485916] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1ff570, flags=00) on file 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument]
[2017-10-18 17:13:41.488223] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.488235] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.489060] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1feb50, flags=00) on file 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument]
[2017-10-18 17:13:42.339936] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.339988] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.343769] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf2012c0, flags=00) on file 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
[2017-10-18 17:13:42.345374] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.345401] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.346259] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf201130, flags=00) on file 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
[2017-10-18 17:13:59.541591] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 17:13:59.541748] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 17:13:59.541887] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 17:13:59.541977] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are down. Going offline until atleast one of them comes back up.

Node 2 /var/log/gluster/tier/<vol>/tierd.log:

[2017-10-16 15:54:08.662873] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fffd714e-b2d2-42d3-a31f-72673276e3d0)
[2017-10-16 16:00:07.201584] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:f10365e1-747b-4985-97b9-8b5dc61ac464)
[2017-10-16 16:00:07.372559] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:f95f17bf-b696-44cd-aae0-d8ac38149aa5)
[2017-10-16 16:06:06.880522] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:ec451f6c-8971-4f9b-a04f-00f96db9b46a)
[2017-10-16 16:06:08.062080] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:e658cd70-3f6d-4b25-8d9f-0d4c24d3ec5d)
[2017-10-16 16:06:08.288298] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:f22df67a-88e5-4fae-aab0-b00e04f9a6e1)
[2017-10-18 15:55:06.446416] I [MSGID: 109028] [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 1376671.00 secs
[2017-10-18 15:55:06.446433] I [MSGID: 109028] [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 47887089, failures: 3594, skipped: 0
[2017-10-19 00:00:00.501576] I [MSGID: 109038] [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction on cold tier
[2017-10-19 00:00:00.502016] I [MSGID: 109038] [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on cold tier
[2017-10-19 00:00:00.501608] I [MSGID: 109038] [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction on cold tier
[2017-10-19 00:00:00.502076] I [MSGID: 109038] [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on cold tier
[2017-10-19 16:03:49.522991] I [MSGID: 109028] [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 1463594.00 secs
[2017-10-19 16:03:49.523017] I [MSGID: 109028] [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 52790654, failures: 3594, skipped: 0

Node 2 /var/log/samba/glusterfs-<vol>-pool.log:

[2017-10-18 16:49:09.218062] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 16:49:09.218254] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f009b36bac0, flags=00) on file 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
[2017-10-18 16:49:09.222783] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 16:49:09.222912] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 16:49:09.223079] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 16:49:09.223200] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are down. Going offline until atleast one of them comes back up.

Status:

# gluster vol tier <vol> status

Node                 Promoted files       Demoted files        Status               run time in h:m:s   
---------                  ---------                  ---------               ---------                 ---------           
Node1                  190861                    0                    in progress          408:34:13
Node2                            0                    0                    in progress          408:34:14

Hot tier bricks:

# df -h

/dev/mapper/vg_bricks-brick_nvme1             1.4T  551G  883G  39% /mnt/brick_nvme1
/dev/mapper/vg_bricks-brick_nvme2             1.4T  512G  922G  36% /mnt/brick_nvme2


Can anyone point me in the right direction as to what may be going on?  Any guidance is greatly appreciated.

Thanks in advance,

HB

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



--
Milind


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux