Just one more information I need from you.
Assuming you have the coredump, could you attach it to gdb and printYou'll need to switch to frame 3 in gdb to get the value of this variable.
On Wed, Dec 21, 2016 at 5:35 PM, Krutika Dhananjay <kdhananj@xxxxxxxxxx> wrote:
-KrutikaThanks for this. The information seems sufficient at the moment.Will get back to you on this if/when I find something.On Mon, Dec 19, 2016 at 1:44 PM, qingwei wei <tchengwee@xxxxxxxxx> wrote:Hi Krutika,
Sorry for the delay as i am busy with other works. Attached is the
tar.gz file with client and server log, the gfid information on the
shard folder (please look at test.0.0 file as the log is captured when
i run fio on this file.) and also the print statement i put inside the
code. Fyi, i did 2 runs this time and only the second run give me
problem. Hope this information helps.
Regards,
Cw
On Thu, Dec 15, 2016 at 8:02 PM, Krutika Dhananjay <kdhananj@xxxxxxxxxx> wrote:
> Good that you asked. I'll try but be warned this will involve me coming back
> to you with lot more questions. :)
>
> I've been trying this for the past two days (not to mention the fio run
> takes
> really long) and so far there has been no crash/assert failure.
>
> If you already have the core:
> in frame 1,
> 0. print block_num
> 1. get lru_inode_ctx->stat.ia_gfid
> 2. convert it to hex
> 3. find the gfid in your backend that corresponds to this gfid and share its
> path in your response
> 4. print priv->inode_count
> 5. and of course lru_inode_ctx->block_num :)
> 6. Also attach the complete brick and client logs.
>
> -Krutika
>
>
> On Thu, Dec 15, 2016 at 3:18 PM, qingwei wei <tchengwee@xxxxxxxxx> wrote:
>>
>> Hi Krutika,
>>
>> Do you need anymore information? Do let me know as i can try on my
>> test system. Thanks.
>>
>> Cw
>>
>> On Tue, Dec 13, 2016 at 12:17 AM, qingwei wei <tchengwee@xxxxxxxxx> wrote:
>> > Hi Krutika,
>> >
>> > You mean FIO command?
>> >
>> > Below is how i do the sequential write. This example i am using 400GB
>> > file, for the SHARD_MAX_INODE=16, i use 300MB file.
>> >
>> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1
>> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs
>> > 256k -numjobs 1 -iodepth 2 -name test -rw write
>> >
>> > And after FIO complete the above workload, i do the random write
>> >
>> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1
>> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs
>> > 8k -numjobs 1 -iodepth 2 -name test -rw randwrite
>> >
>> > The error (Sometimes segmentation fault) only happen during random
>> > write.
>> >
>> > The gluster volume is 3 replica volume with shard enable and 16MB
>> > shard block size.
>> >
>> > Thanks.
>> >
>> > Cw
>> >
>> > On Tue, Dec 13, 2016 at 12:00 AM, Krutika Dhananjay
>> > <kdhananj@xxxxxxxxxx> wrote:
>> >> I tried but couldn't recreate this issue (even with SHARD_MAX_INODES
>> >> being
>> >> 16).
>> >> Could you share the exact command you used?
>> >>
>> >> -Krutika
>> >>
>> >> On Mon, Dec 12, 2016 at 12:15 PM, qingwei wei <tchengwee@xxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> Hi Krutika,
>> >>>
>> >>> Thanks. Looking forward to your reply.
>> >>>
>> >>> Cw
>> >>>
>> >>> On Mon, Dec 12, 2016 at 2:27 PM, Krutika Dhananjay
>> >>> <kdhananj@xxxxxxxxxx>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > First of all, apologies for the late reply. Couldn't find time to
>> >>> > look
>> >>> > into
>> >>> > this
>> >>> > until now.
>> >>> >
>> >>> > Changing SHARD_MAX_INODES value from 12384 to 16 is a cool trick!
>> >>> > Let me try that as well and get back to you in some time.
>> >>> >
>> >>> > -Krutika
>> >>> >
>> >>> > On Thu, Dec 8, 2016 at 11:07 AM, qingwei wei <tchengwee@xxxxxxxxx>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> With the help from my colleague, we did some changes to the code
>> >>> >> with
>> >>> >> reduce number of SHARD_MAX_INODES (from 16384 to 16) and also
>> >>> >> include
>> >>> >> the printing of blk_num inside __shard_update_shards_inode_list. We
>> >>> >> then execute fio to first do sequential write of 300MB file. After
>> >>> >> this run completed, we then use fio to generate random write (8k).
>> >>> >> And
>> >>> >> during this random write run, we found that there is situation
>> >>> >> where
>> >>> >> the blk_num is negative number and this trigger the following
>> >>> >> assertion.
>> >>> >>
>> >>> >> GF_ASSERT (lru_inode_ctx->block_num > 0);
>> >>> >>
>> >>> >> [2016-12-08 03:16:34.217582] E
>> >>> >> [shard.c:468:__shard_update_shards_inode_list]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> (-->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so (shard_common_lookup_shards_cb k+0x2d)
>> >>> >> [0x7f7300930b6d]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so( shard_link_block_inode+0xce)
>> >>> >> [0x7f7300930b1e]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so( __shard_update_shards_inode_li st+0x36b)
>> >>> >> [0x7f730092bf5b] ) 0-: Assertion failed: lru_inode_ctx->block_num >
>> >>> >> 0
>> >>> >>
>> >>> >> Also, there is segmentation fault shortly after this assertion and
>> >>> >> after that fio exit with error.
>> >>> >>
>> >>> >> frame : type(0) op(0)
>> >>> >> patchset: git://git.gluster.com/glusterfs.git
>> >>> >> signal received: 11
>> >>> >> time of crash:
>> >>> >> 2016-12-08 03:16:34
>> >>> >> configuration details:
>> >>> >> argp 1
>> >>> >> backtrace 1
>> >>> >> dlfcn 1
>> >>> >> libpthread 1
>> >>> >> llistxattr 1
>> >>> >> setfsid 1
>> >>> >> spinlock 1
>> >>> >> epoll.h 1
>> >>> >> xattr.h 1
>> >>> >> st_atim.tv_nsec 1
>> >>> >> package-string: glusterfs 3.7.17
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9 2)[0x7f730e900332]
>> >>> >>
>> >>> >> /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x2d5)[0x7f7 30e9250b5]
>> >>> >> /lib64/libc.so.6(+0x35670)[0x7f730d1f1670]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(__s hard_update_shards_inode_list+ 0x1d4)[0x7f730092bdc4]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(sha rd_link_block_inode+0xce)[0x7f 7300930b1e]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so(sha rd_common_lookup_shards_cbk+ 0x2d)[0x7f7300930b6d]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/cluster/distribute.so (dht_lookup_cbk+0x380)[0x7f730 0b8e240]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/glusterfs/3.7.17/xlator/protocol/client.so(cl ient3_3_lookup_cbk+0x769)[0x7f 7300df4989]
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7 f730e6ce010]
>> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x1df)[0x7f730e 6ce2ef]
>> >>> >>
>> >>> >> /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f 730e6ca483]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/glusterfs/3.7.17/rpc-transport/socket.so(+0x6 344)[0x7f73034dc344]
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> /usr/local/lib/glusterfs/3.7.17/rpc-transport/socket.so(+0x8 f44)[0x7f73034def44]
>> >>> >> /usr/local/lib/libglusterfs.so.0(+0x925aa)[0x7f730e96c5aa]
>> >>> >> /lib64/libpthread.so.0(+0x7dc5)[0x7f730d96ddc5]
>> >>> >>
>> >>> >> Core dump:
>> >>> >>
>> >>> >> Using host libthread_db library "/lib64/libthread_db.so.1".
>> >>> >> Core was generated by `/usr/local/sbin/glusterfs
>> >>> >> --volfile-server=10.217.242.32 --volfile-id=/testSF1'.
>> >>> >> Program terminated with signal 11, Segmentation fault.
>> >>> >> #0 list_del_init (old=0x7f72f4003de0) at
>> >>> >> ../../../../libglusterfs/src/list.h:87
>> >>> >> 87 old->prev->next = old->next;
>> >>> >>
>> >>> >> bt
>> >>> >>
>> >>> >> #0 list_del_init (old=0x7f72f4003de0) at
>> >>> >> ../../../../libglusterfs/src/list.h:87
>> >>> >> #1 __shard_update_shards_inode_list
>> >>> >> (linked_inode=linked_inode@entry=0x7f72fa7a6e48,
>> >>> >> this=this@entry=0x7f72fc0090c0, base_inode=0x7f72fa7a5108,
>> >>> >> block_num=block_num@entry=10) at shard.c:469
>> >>> >> #2 0x00007f7300930b1e in shard_link_block_inode
>> >>> >> (local=local@entry=0x7f730ec4ed00, block_num=10, inode=<optimized
>> >>> >> out>,
>> >>> >> buf=buf@entry=0x7f730180c990) at shard.c:1559
>> >>> >> #3 0x00007f7300930b6d in shard_common_lookup_shards_cbk
>> >>> >> (frame=0x7f730c611204, cookie=<optimized out>, this=0x7f72fc0090c0,
>> >>> >> op_ret=0,
>> >>> >> op_errno=<optimized out>, inode=<optimized out>,
>> >>> >> buf=0x7f730180c990, xdata=0x7f730c029cdc,
>> >>> >> postparent=0x7f730180ca00)
>> >>> >> at shard.c:1596
>> >>> >> #4 0x00007f7300b8e240 in dht_lookup_cbk (frame=0x7f730c61dc40,
>> >>> >> cookie=<optimized out>, this=<optimized out>, op_ret=0,
>> >>> >> op_errno=22,
>> >>> >> inode=0x7f72fa7a6e48, stbuf=0x7f730180c990,
>> >>> >> xattr=0x7f730c029cdc,
>> >>> >> postparent=0x7f730180ca00) at dht-common.c:2362
>> >>> >> #5 0x00007f7300df4989 in client3_3_lookup_cbk (req=<optimized
>> >>> >> out>,
>> >>> >> iov=<optimized out>, count=<optimized out>, myframe=0x7f730c616ab4)
>> >>> >> at client-rpc-fops.c:2988
>> >>> >> #6 0x00007f730e6ce010 in rpc_clnt_handle_reply
>> >>> >> (clnt=clnt@entry=0x7f72fc04c040,
>> >>> >> pollin=pollin@entry=0x7f72fc079560)
>> >>> >> at rpc-clnt.c:796
>> >>> >> #7 0x00007f730e6ce2ef in rpc_clnt_notify (trans=<optimized out>,
>> >>> >> mydata=0x7f72fc04c070, event=<optimized out>, data=""> >> >>> >> at rpc-clnt.c:967
>> >>> >> #8 0x00007f730e6ca483 in rpc_transport_notify
>> >>> >> (this=this@entry=0x7f72fc05bd30,
>> >>> >> event=event@entry=RPC_TRANSPORT_MSG_RECEIVED,
>> >>> >> data="">0) at rpc-transport.c:546
>> >>> >> #9 0x00007f73034dc344 in socket_event_poll_in
>> >>> >> (this=this@entry=0x7f72fc05bd30) at socket.c:2250
>> >>> >> #10 0x00007f73034def44 in socket_event_handler (fd=fd@entry=10,
>> >>> >> idx=idx@entry=2, data="" poll_in=1, poll_out=0,
>> >>> >> poll_err=0)
>> >>> >> at socket.c:2363
>> >>> >> #11 0x00007f730e96c5aa in event_dispatch_epoll_handler
>> >>> >> (event=0x7f730180ced0, event_pool=0xf42ee0) at event-epoll.c:575
>> >>> >> #12 event_dispatch_epoll_worker (data="" at
>> >>> >> event-epoll.c:678
>> >>> >> #13 0x00007f730d96ddc5 in start_thread () from
>> >>> >> /lib64/libpthread.so.0
>> >>> >> #14 0x00007f730d2b2ced in clone () from /lib64/libc.so.6
>> >>> >>
>> >>> >> It seems like there is some situation where the structure is not
>> >>> >> intialized properly? Appreciate if anyone can advice. Thanks.
>> >>> >>
>> >>> >> Cw
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Wed, Dec 7, 2016 at 9:42 AM, qingwei wei <tchengwee@xxxxxxxxx>
>> >>> >> wrote:
>> >>> >> > Hi,
>> >>> >> >
>> >>> >> > I did another test and this time FIO fails with
>> >>> >> >
>> >>> >> > fio: io_u error on file /mnt/testSF-HDD1/test: Invalid argument:
>> >>> >> > write
>> >>> >> > offset=114423242752, buflen=8192
>> >>> >> > fio: pid=10052, err=22/file:io_u.c:1582, func=io_u error,
>> >>> >> > error=Invalid
>> >>> >> > argument
>> >>> >> >
>> >>> >> > test: (groupid=0, jobs=1): err=22 (file:io_u.c:1582, func=io_u
>> >>> >> > error,
>> >>> >> > error=Invalid argument): pid=10052: Tue Dec 6 15:18:47 2016
>> >>> >> >
>> >>> >> >
>> >>> >> > Below is the client log:
>> >>> >> >
>> >>> >> > [2016-12-06 05:19:31.261289] I
>> >>> >> > [fuse-bridge.c:5171:fuse_graph_setup]
>> >>> >> > 0-fuse: switched to graph 0
>> >>> >> > [2016-12-06 05:19:31.261355] I [MSGID: 114035]
>> >>> >> > [client-handshake.c:193:client_set_lk_version_cbk]
>> >>> >> > 0-testSF-HDD-client-5: Server lk version = 1
>> >>> >> > [2016-12-06 05:19:31.261404] I [fuse-bridge.c:4083:fuse_init]
>> >>> >> > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
>> >>> >> > 7.22
>> >>> >> > kernel 7.22
>> >>> >> > [2016-12-06 05:19:31.262901] I [MSGID: 108031]
>> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> > 0-testSF-HDD-replicate-0:
>> >>> >> > selecting local read_child testSF-HDD-client-1
>> >>> >> > [2016-12-06 05:19:31.262930] I [MSGID: 108031]
>> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> > 0-testSF-HDD-replicate-0:
>> >>> >> > selecting local read_child testSF-HDD-client-0
>> >>> >> > [2016-12-06 05:19:31.262948] I [MSGID: 108031]
>> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> > 0-testSF-HDD-replicate-0:
>> >>> >> > selecting local read_child testSF-HDD-client-2
>> >>> >> > [2016-12-06 05:19:31.269592] I [MSGID: 108031]
>> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> > 0-testSF-HDD-replicate-1:
>> >>> >> > selecting local read_child testSF-HDD-client-3
>> >>> >> > [2016-12-06 05:19:31.269795] I [MSGID: 108031]
>> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> > 0-testSF-HDD-replicate-1:
>> >>> >> > selecting local read_child testSF-HDD-client-4
>> >>> >> > [2016-12-06 05:19:31.277763] I [MSGID: 108031]
>> >>> >> > [afr-common.c:2071:afr_local_discovery_cbk]
>> >>> >> > 0-testSF-HDD-replicate-1:
>> >>> >> > selecting local read_child testSF-HDD-client-5
>> >>> >> > [2016-12-06 06:58:05.399244] W [MSGID: 101159]
>> >>> >> > [inode.c:1219:__inode_unlink] 0-inode:
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > be318638-e8a0-4c6d-977d-7a937aa84806/864c9ea1-3a7e-4d41- 899b-f30604a7584e.16284:
>> >>> >> > dentry not found in 63af10b7-9dac-4a53-aab1-3cc17fff3255
>> >>> >> > [2016-12-06 15:17:43.311400] E
>> >>> >> > [shard.c:460:__shard_update_shards_inode_list]
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha rd_common_lookup_shards_cbk+ 0x2d)
>> >>> >> > [0x7f5575680fdd]
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shar d_link_block_inode+0xdf)
>> >>> >> > [0x7f5575680f6f]
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh ard_update_shards_inode_list+ 0x22e)
>> >>> >> > [0x7f557567c1ce] ) 0-: Assertion failed: lru_inode_ctx->block_num
>> >>> >> > > 0
>> >>> >> > [2016-12-06 15:17:43.311472] W [inode.c:1232:inode_unlink]
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha rd_link_block_inode+0xdf)
>> >>> >> > [0x7f5575680f6f]
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh ard_update_shards_inode_list+ 0x14a)
>> >>> >> > [0x7f557567c0ea] -->/lib64/libglusterfs.so.0(inode_unlink+0x9c)
>> >>> >> > [0x7f558386ba0c] ) 0-testSF-HDD-shard: inode not found
>> >>> >> > [2016-12-06 15:17:43.333456] W [inode.c:1133:inode_forget]
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha rd_link_block_inode+0xdf)
>> >>> >> > [0x7f5575680f6f]
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh ard_update_shards_inode_list+ 0x154)
>> >>> >> > [0x7f557567c0f4] -->/lib64/libglusterfs.so.0(inode_forget+0x90)
>> >>> >> > [0x7f558386b800] ) 0-testSF-HDD-shard: inode not found
>> >>> >> > [2016-12-06 15:18:47.129794] W
>> >>> >> > [fuse-bridge.c:2311:fuse_writev_cbk]
>> >>> >> > 0-glusterfs-fuse: 12555429: WRITE => -1
>> >>> >> > gfid=864c9ea1-3a7e-4d41-899b-f30604a7584e fd=0x7f557016ae6c
>> >>> >> > (Invalid
>> >>> >> > argument)
>> >>> >> >
>> >>> >> > Below is the code and it will go to the else block when
>> >>> >> > inode_count
>> >>> >> > is
>> >>> >> > greater than SHARD_MAX_INODES which is 16384. And my dataset of
>> >>> >> > 400GB
>> >>> >> > with 16MB shard size has enough shard file (400GB/16MB) to
>> >>> >> > achieve
>> >>> >> > it.
>> >>> >> > When i do the test with smaller dataset, there is no such error.
>> >>> >> >
>> >>> >> > shard.c
>> >>> >> >
>> >>> >> > if (priv->inode_count + 1 <= SHARD_MAX_INODES) {
>> >>> >> > /* If this inode was linked here for the first
>> >>> >> > time
>> >>> >> > (indicated
>> >>> >> > * by empty list), and if there is still space in
>> >>> >> > the
>> >>> >> > priv list,
>> >>> >> > * add this ctx to the tail of the list.
>> >>> >> > */
>> >>> >> > gf_uuid_copy (ctx->base_gfid,
>> >>> >> > base_inode->gfid);
>> >>> >> > ctx->block_num = block_num;
>> >>> >> > list_add_tail (&ctx->ilist,
>> >>> >> > &priv->ilist_head);
>> >>> >> > priv->inode_count++;
>> >>> >> > } else {
>> >>> >> > /*If on the other hand there is no available slot
>> >>> >> > for
>> >>> >> > this inode
>> >>> >> > * in the list, delete the lru inode from the
>> >>> >> > head of
>> >>> >> > the list,
>> >>> >> > * unlink it. And in its place add this new inode
>> >>> >> > into
>> >>> >> > the list.
>> >>> >> > */
>> >>> >> > lru_inode_ctx = list_first_entry
>> >>> >> > (&priv->ilist_head,
>> >>> >> >
>> >>> >> > shard_inode_ctx_t,
>> >>> >> > ilist);
>> >>> >> > /* add in message for debug*/
>> >>> >> > gf_msg (THIS->name, GF_LOG_WARNING, 0,
>> >>> >> > SHARD_MSG_INVALID_FOP,
>> >>> >> > "block number = %d",
>> >>> >> > lru_inode_ctx->block_num);
>> >>> >> >
>> >>> >> > GF_ASSERT (lru_inode_ctx->block_num > 0);
>> >>> >> >
>> >>> >> > Hopefully can get some advice from you guys on this. Thanks.
>> >>> >> >
>> >>> >> > Cw
>> >>> >> >
>> >>> >> > On Tue, Dec 6, 2016 at 9:07 AM, qingwei wei <tchengwee@xxxxxxxxx>
>> >>> >> > wrote:
>> >>> >> >> Hi,
>> >>> >> >>
>> >>> >> >> This is the repost of my email in the gluster-user mailing list.
>> >>> >> >> Appreciate if anyone has any idea on the issue i have now.
>> >>> >> >> Thanks.
>> >>> >> >>
>> >>> >> >> I encountered this when i do the FIO random write on the fuse
>> >>> >> >> mount
>> >>> >> >> gluster volume. After this assertion happen, the client log is
>> >>> >> >> filled
>> >>> >> >> with pending frames messages and FIO just show zero IO in the
>> >>> >> >> progress
>> >>> >> >> status. As i leave this test to run overnight, the client log
>> >>> >> >> file
>> >>> >> >> fill up with those pending frame messages and hit 28GB for
>> >>> >> >> around 12
>> >>> >> >> hours.
>> >>> >> >>
>> >>> >> >> The client log:
>> >>> >> >>
>> >>> >> >> [2016-12-04 15:48:35.274208] W [MSGID: 109072]
>> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: got
>> >>> >> >> non-linkfile
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf 7ed.7038,
>> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000
>> >>> >> >> [2016-12-04 15:48:35.277208] W [MSGID: 109072]
>> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: got
>> >>> >> >> non-linkfile
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf 7ed.8957,
>> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000
>> >>> >> >> [2016-12-04 15:48:35.277588] W [MSGID: 109072]
>> >>> >> >> [dht-linkfile.c:50:dht_linkfile_lookup_cbk] 0-testSF-dht: got
>> >>> >> >> non-linkfile
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> testSF-replicate-0:/.shard/21da7b64-45e5-4c6a-9244-53d0284bf 7ed.11912,
>> >>> >> >> gfid = 00000000-0000-0000-0000-000000000000
>> >>> >> >> [2016-12-04 15:48:35.312751] E
>> >>> >> >> [shard.c:460:__shard_update_shards_inode_list]
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> (-->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(sha rd_common_lookup_shards_cbk+ 0x2d)
>> >>> >> >> [0x7f86cc42efdd]
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(shar d_link_block_inode+0xdf)
>> >>> >> >> [0x7f86cc42ef6f]
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> -->/usr/lib64/glusterfs/3.7.17/xlator/features/shard.so(__sh ard_update_shards_inode_list+ 0x22e)
>> >>> >> >> [0x7f86cc42a1ce] ) 0-: Assertion failed:
>> >>> >> >> lru_inode_ctx->block_num >
>> >>> >> >> 0
>> >>> >> >> pending frames:
>> >>> >> >> frame : type(0) op(0)
>> >>> >> >> frame : type(0) op(0)
>> >>> >> >> frame : type(0) op(0)
>> >>> >> >> frame : type(0) op(0)
>> >>> >> >> frame : type(0) op(0)
>> >>> >> >> frame : type(0) op(0)
>> >>> >> >>
>> >>> >> >> Gluster info (i am testing this on one server with each disk
>> >>> >> >> representing one brick, this gluster volume is then mounted
>> >>> >> >> locally
>> >>> >> >> via fuse)
>> >>> >> >>
>> >>> >> >> Volume Name: testSF
>> >>> >> >> Type: Distributed-Replicate
>> >>> >> >> Volume ID: 3f205363-5029-40d7-b1b5-216f9639b454
>> >>> >> >> Status: Started
>> >>> >> >> Number of Bricks: 2 x 3 = 6
>> >>> >> >> Transport-type: tcp
>> >>> >> >> Bricks:
>> >>> >> >> Brick1: 192.168.123.4:/mnt/sdb_mssd/testSF
>> >>> >> >> Brick2: 192.168.123.4:/mnt/sdc_mssd/testSF
>> >>> >> >> Brick3: 192.168.123.4:/mnt/sdd_mssd/testSF
>> >>> >> >> Brick4: 192.168.123.4:/mnt/sde_mssd/testSF
>> >>> >> >> Brick5: 192.168.123.4:/mnt/sdf_mssd/testSF
>> >>> >> >> Brick6: 192.168.123.4:/mnt/sdg_mssd/testSF
>> >>> >> >> Options Reconfigured:
>> >>> >> >> features.shard-block-size: 16MB
>> >>> >> >> features.shard: on
>> >>> >> >> performance.readdir-ahead: on
>> >>> >> >>
>> >>> >> >> Gluster version: 3.7.17
>> >>> >> >>
>> >>> >> >> The actual disk usage (Is about 91% full):
>> >>> >> >>
>> >>> >> >> /dev/sdb1 235G 202G 22G 91% /mnt/sdb_mssd
>> >>> >> >> /dev/sdc1 235G 202G 22G 91% /mnt/sdc_mssd
>> >>> >> >> /dev/sdd1 235G 202G 22G 91% /mnt/sdd_mssd
>> >>> >> >> /dev/sde1 235G 200G 23G 90% /mnt/sde_mssd
>> >>> >> >> /dev/sdf1 235G 200G 23G 90% /mnt/sdf_mssd
>> >>> >> >> /dev/sdg1 235G 200G 23G 90% /mnt/sdg_mssd
>> >>> >> >>
>> >>> >> >> Anyone encounter this issue before?
>> >>> >> >>
>> >>> >> >> Cw
>> >>> >> _______________________________________________
>> >>> >> Gluster-devel mailing list
>> >>> >> Gluster-devel@xxxxxxxxxxx
>> >>> >> http://www.gluster.org/mailman/listinfo/gluster-devel
>> >>> >
>> >>> >
>> >>
>> >>
>
>
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel