Re: gluster tiering errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Milind - Thank you for your help, I appreciate it..

It appears that the tiering behaves the same when quota is turned off, info:

# gluster vol info <vol>
 
Volume Name: <vol>
Type: Tier
Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad
Status: Started
Snapshot Count: 0
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: <node2>:/mnt/brick_nvme1/brick
Brick2: <node1>:/mnt/brick_nvme2/brick
Brick3: <node2>:/mnt/brick_nvme2/brick
Brick4: <node1>:/mnt/brick_nvme1/brick
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: <node1>:/mnt/brick1/brick
Brick6: <node2>:/mnt/brick2/brick
Brick7: <node1>:/mnt/brick2/brick
Brick8: <node2>:/mnt/brick1/brick
Options Reconfigured:
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
performance.write-behind-window-size: 4MB
performance.cache-size: 16GB
features.inode-quota: off
features.quota: off
nfs.disable: on
transport.address-family: inet
features.ctr-enabled: on
cluster.tier-mode: cache
performance.io-cache: off
performance.quick-read: off
cluster.tier-max-files: 1000000

Errors in /var/log/glusterfs/tier/<vol>/tierd.log on node1 after turning off quota:

[2017-10-27 18:38:08.880502] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/83540503.jpg
[2017-10-27 18:38:08.880686] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/83540503.jpg on <vol>-hot-dht [Input/output error]
[2017-10-27 18:38:08.880717] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/83540503.jpg
[2017-10-27 18:38:08.881101] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/83540503.jpg  [No space left on device]
[2017-10-27 18:38:08.881145] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 83540503.jpg(gfid:00cf352a-0a21-42d3-91ae-fe6fc63fac9d)
[2017-10-27 18:38:08.891692] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/152640504.jpg
[2017-10-27 18:38:08.891876] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/152640504.jpg on <vol>-hot-dht [Input/output error]
[2017-10-27 18:38:08.891899] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/152640504.jpg
[2017-10-27 18:38:08.920077] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/152640504.jpg  [No space left on device]
[2017-10-27 18:38:08.920121] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 152640504.jpg(gfid:0436b8b5-2e15-411e-acfa-a5870cf125bf)
[2017-10-27 18:38:08.952939] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/89240318.jpg
[2017-10-27 18:38:08.953121] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/89240318.jpg on <vol>-hot-dht [Input/output error]
[2017-10-27 18:38:08.953147] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/89240318.jpg
[2017-10-27 18:38:08.959510] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/89240318.jpg  [No space left on device]
[2017-10-27 18:38:08.959560] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 89240318.jpg(gfid:1143c9bb-ea79-4c15-ad03-97a611d53135)
[2017-10-27 18:38:08.986665] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/106056906.jpg
[2017-10-27 18:38:08.986871] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create /path/to/106056906.jpg on <vol>-hot-dht [Input/output error]
[2017-10-27 18:38:08.986904] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - /path/to/106056906.jpg
[2017-10-27 18:38:08.991468] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate /path/to/106056906.jpg  [No space left on device]
[2017-10-27 18:38:08.991505] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for 106056906.jpg(gfid:07f5e5d4-315f-4299-a62f-6bd8f159c89d)
[2017-10-27 18:38:09.025433] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/114649988.jpg

I wanted to add a couple data points here:

- Most (95%) of the logging is logged to node1 of the 2 node cluster.  

     The tierd.log file on node1 is 588M in size due to all of the failure errors.  The tierd.log file on node2 is only ~205K in size.
     I believe I posted earlier that all promoted files are listed on node1:

     # gluster vol tier <vol> status
      Node                 Promoted files       Demoted files        Status               run time in h:m:s   
      ------                        ---------                     ---------            ---------                ---------           
    <node2>                      0                            0                    in progress          601:37:43
    <node1>                 271966                       0                    in progress          601:37:42

     Is this expected behavior?

- We are sharing the data (the same share) via SMB and AFP to be accessed by PC's and Mac's.  The Mac's are using AFP since they have so much difficultly with SMB and network file shares.

     I know the Mac's create all kinds of 'special' files when working on the share, could there be a problem with certain files and tiering?  For example (from node2 tierd.log):
     
     [2017-10-26 19:30:08.147159] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:db430070-b9c5-4bd2-b4c6-a347b838a97e)
     [2017-10-26 22:28:08.218565] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:f745bea6-04bd-4904-8237-1bd7c9c92f5b)
     [2017-10-26 22:28:08.221909] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:bed73314-8740-4822-9fb7-95257434e283)
     [2017-10-26 22:28:08.223767] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for .DS_Store(gfid:bf1df49b-c264-449d-9bc6-65bcfd48fa4e)

     The .DS_Store files are Mac specific files..

     Since users work directly off of the share, are there potential problems with tiering and locks?  I do see warnings (on node1 tierd.log):

     [2017-10-27 18:30:08.719976] W [MSGID: 109023] [dht-rebalance.c:639:__is_file_migratable] 0-<vol>-tier-dht: Migrate file failed: /path/to/file.ai: File has locks. Skipping file migration
     [2017-10-27 18:32:08.483971] W [MSGID: 109023] [dht-rebalance.c:639:__is_file_migratable] 0-<vol>-tier-dht: Migrate file failed: /path/to/file-v1.ai: File has locks. Skipping file migration


- The directory structure (over the many years) has spaces in the names of files and folders, sometimes I'm finding, even at the end of a file.

     Could spaces in names of files and folders be causing issues with tiering?


I'm still not sure what the [No space left on device] messages are coming from as it does not appear that there are any space issues.  Even before I turned off quota on the volume the sizing appeared to be fine:


# gluster vol quota <vol> list
                  Path                   Hard-limit          Soft-limit         Used       Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/path1                                   500.0GB     80%(400.0GB)    1.9MB      500.0GB              No                   No
/path2                                    25.0TB       80%(20.0TB)     19.2TB        5.8TB                 No                   No


I will have some time this weekend to take the shares offline.  Are there any steps I can take to clean up the hot tier, resync, or other, to ensure all is in a good state?

Thanks in advance..

HB





On Thu, Oct 26, 2017 at 9:17 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
Herb,
I'm trying to weed out issues here.

So, I can see quota turned on and would like you to check the quota settings and test to see system behavior if quota is turned off.

Although the file size that failed migration was 29K, I'm being a bit paranoid while weeding out issues.

Are you still facing tiering errors ?
I can see your response to Alex with the disk space consumption and found it a bit ambiguous w.r.t. state of affairs.

--
Milind



On Tue, Oct 24, 2017 at 11:34 PM, Herb Burnswell <herbert.burnswell@xxxxxxxxx> wrote:
Milind - Thank you for the response..

>> What are the high and low watermarks for the tier set at ?

# gluster volume get <vol> cluster.watermark-hi
Option                                  Value                                   
------                                  -----                                   
cluster.watermark-hi                    90                                      

# gluster volume get <vol> cluster.watermark-low
Option                                  Value                                   
------                                  -----                                   
cluster.watermark-low                   75                                      


>> What is the size of the file that failed to migrate as per the following tierd log:

>> [2017-10-19 17:52:07.519614] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)

The file was a word doc @ 29K in size.

>>If possible, a gluster volume info would also help, instead of going to and fro with questions.

# gluster vol info
 
Volume Name: ctdb
Type: Replicate
Volume ID: f679c476-e0dd-4f3a-9813-1b26016b5384
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: <node1>:/mnt/ctdb_local/brick
Brick2: <node2>:/mnt/ctdb_local/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
 
Volume Name: <vol>
Type: Tier
Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad
Status: Started
Snapshot Count: 0
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: <node2>:/mnt/brick_nvme1/brick
Brick2: <node1>:/mnt/brick_nvme2/brick
Brick3: <node2>:/mnt/brick_nvme2/brick
Brick4: <node1>:/mnt/brick_nvme1/brick
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: <node1>:/mnt/brick1/brick
Brick6: <node2>:/mnt/brick2/brick
Brick7: <node1>:/mnt/brick2/brick
Brick8: <node2>:/mnt/brick1/brick
Options Reconfigured:
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
performance.write-behind-window-size: 4MB
performance.cache-size: 16GB
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
transport.address-family: inet
features.ctr-enabled: on
cluster.tier-mode: cache
performance.io-cache: off
performance.quick-read: off
cluster.tier-max-files: 1000000


HB
            



On Sun, Oct 22, 2017 at 8:41 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
Herb,
What are the high and low watermarks for the tier set at ?

# gluster volume get <vol> cluster.watermark-hi

# gluster volume get <vol> cluster.watermark-low

What is the size of the file that failed to migrate as per the following tierd log:

[2017-10-19 17:52:07.519614] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)

If possible, a gluster volume info would also help, instead of going to and fro with questions.

--
Milind



On Fri, Oct 20, 2017 at 12:42 AM, Herb Burnswell <herbert.burnswell@xxxxxxxxx> wrote:
All,

I am new to gluster and have some questions/concerns about some tiering errors that I see in the log files.

OS: CentOs 7.3.1611
Gluster version: 3.10.5
Samba version: 4.6.2

I see the following (scrubbed):

Node 1 /var/log/glusterfs/tier/<vol>/tierd.log:

[2017-10-19 17:52:07.519614] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
[2017-10-19 17:52:07.525110] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.526088] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.526111] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.527214] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.527244] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fb4411c4-a387-4e5f-a2b7-897633ef4aa8)
[2017-10-19 17:52:07.533510] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.534434] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.534453] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.535570] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.535594] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fba421e7-0500-47c4-bf67-10a40690e13d)
[2017-10-19 17:52:07.541363] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.542296] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.542357] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.543480] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.543521] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fe6799e1-42e6-43e5-a7eb-ac8facfcbc9f)
[2017-10-19 17:52:07.549959] E [MSGID: 109011] [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for path=/path/to/<file>
[2017-10-19 17:52:07.550901] E [MSGID: 109023] [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output error]
[2017-10-19 17:52:07.550922] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
[2017-10-19 17:52:07.551896] E [MSGID: 109037] [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>  [No space left on device]
[2017-10-19 17:52:07.551917] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:ffe3a3f2-b170-43f0-a9fb-97c78e3173eb)
[2017-10-19 17:52:07.551945] E [MSGID: 109037] [tier.c:2565:tier_run] 0-<vol>-tier-dht: Promotion failed

Node 1 /var/log/samba/glusterfs-<vol>-pool.log:

[2017-10-18 17:13:41.481860] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.481860] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.485916] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1ff570, flags=00) on file 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument]
[2017-10-18 17:13:41.488223] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.488235] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994) [Invalid argument]
[2017-10-18 17:13:41.489060] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1feb50, flags=00) on file 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument]
[2017-10-18 17:13:42.339936] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.339988] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.343769] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf2012c0, flags=00) on file 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
[2017-10-18 17:13:42.345374] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.345401] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 17:13:42.346259] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf201130, flags=00) on file 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
[2017-10-18 17:13:59.541591] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 17:13:59.541748] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 17:13:59.541887] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 17:13:59.541977] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are down. Going offline until atleast one of them comes back up.

Node 2 /var/log/gluster/tier/<vol>/tierd.log:

[2017-10-16 15:54:08.662873] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:fffd714e-b2d2-42d3-a31f-72673276e3d0)
[2017-10-16 16:00:07.201584] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:f10365e1-747b-4985-97b9-8b5dc61ac464)
[2017-10-16 16:00:07.372559] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:f95f17bf-b696-44cd-aae0-d8ac38149aa5)
[2017-10-16 16:06:06.880522] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:ec451f6c-8971-4f9b-a04f-00f96db9b46a)
[2017-10-16 16:06:08.062080] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:e658cd70-3f6d-4b25-8d9f-0d4c24d3ec5d)
[2017-10-16 16:06:08.288298] I [MSGID: 109038] [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion failed for <file>(gfid:f22df67a-88e5-4fae-aab0-b00e04f9a6e1)
[2017-10-18 15:55:06.446416] I [MSGID: 109028] [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 1376671.00 secs
[2017-10-18 15:55:06.446433] I [MSGID: 109028] [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 47887089, failures: 3594, skipped: 0
[2017-10-19 00:00:00.501576] I [MSGID: 109038] [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction on cold tier
[2017-10-19 00:00:00.502016] I [MSGID: 109038] [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on cold tier
[2017-10-19 00:00:00.501608] I [MSGID: 109038] [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction on cold tier
[2017-10-19 00:00:00.502076] I [MSGID: 109038] [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on cold tier
[2017-10-19 16:03:49.522991] I [MSGID: 109028] [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 1463594.00 secs
[2017-10-19 16:03:49.523017] I [MSGID: 109028] [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 52790654, failures: 3594, skipped: 0

Node 2 /var/log/samba/glusterfs-<vol>-pool.log:

[2017-10-18 16:49:09.218062] E [MSGID: 114031] [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b) [Invalid argument]
[2017-10-18 16:49:09.218254] E [MSGID: 109089] [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task] 0-<vol>-tier-dht: Failed to open the fd (0x7f009b36bac0, flags=00) on file 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
[2017-10-18 16:49:09.222783] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 16:49:09.222912] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 16:49:09.223079] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-10-18 16:49:09.223200] E [MSGID: 108006] [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are down. Going offline until atleast one of them comes back up.

Status:

# gluster vol tier <vol> status

Node                 Promoted files       Demoted files        Status               run time in h:m:s   
---------                  ---------                  ---------               ---------                 ---------           
Node1                  190861                    0                    in progress          408:34:13
Node2                            0                    0                    in progress          408:34:14

Hot tier bricks:

# df -h

/dev/mapper/vg_bricks-brick_nvme1             1.4T  551G  883G  39% /mnt/brick_nvme1
/dev/mapper/vg_bricks-brick_nvme2             1.4T  512G  922G  36% /mnt/brick_nvme2


Can anyone point me in the right direction as to what may be going on?  Any guidance is greatly appreciated.

Thanks in advance,

HB

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



--
Milind



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



--
Milind


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux