Hi, Please find the comments inlined. On Mon, Dec 1, 2008 at 8:54 PM, Manhong Dai <daimh at umich.edu> wrote: > Hi, > > > After a month's file operations, which included coping 20 million of > small files and about 20 thousand of cluster jobs, I am overall > satisfied except two stability glitches. > > > 1. A small portion (about 1%?) of jobs got an error of "transport > endpoint not connected", and output file is incomplete. This error > happened on random computing nodes, and it doesn't affect subsequent > jobs on the same node. An example of error message of glusterfsd is > 2008-11-19 23:09:51 E [protocol.c:271:gf_block_unserialize_transport] > server: EOF from peer (172.20.102.2:1022) > > Error of glusterfs is either (looks to be caused by brick) > 2008-11-19 23:09:52 C [client-protocol.c:212:call_bail] muskie-brick: > bailing transport > 2008-11-19 23:09:52 E [client-protocol.c:4834:client_protocol_cleanup] > muskie-brick: forced unwinding frame type(1) op(14) reply=@0x67e2150 > 2008-11-19 23:09:52 E [client-protocol.c:3254:client_write_cbk] > muskie-brick: no proper reply from server, returning ENOTCONN > 2008-11-19 23:09:56 E [write-behind.c:602:wb_writev] wb: delayed error : > 107 > > or (caused by namespace) > 2008-11-28 20:47:53 C [client-protocol.c:212:call_bail] muskie-ns: > bailing transport > 2008-11-28 20:47:53 E [client-protocol.c:4834:client_protocol_cleanup] > muskie-ns: forced unwinding frame type(1) op(40) reply=@0x1b447cc0 > 2008-11-28 20:47:53 E [client-protocol.c:4613:client_checksum_cbk] > muskie-ns: no proper reply from server, returning ENOTCONN > 2008-11-28 20:47:53 E [client-protocol.c:325:client_protocol_xfer] > muskie-ns: transport_submit failed > > what is the transport timeout you are using? If the transport-timeout is small and the server is busy serving other requests, there is a good possibility that the operations are bailing out and resulting in ENOTCONN errors. Are you using io-threads on server side? Can you send the configuration files? > > 2. Right now the process 'glusterfs' takes 1785M virt mem, and 1500 RES > mem, according to top. I hope this is not a memory leak, or at least > there should be a way to reduce memory usage without remounting it. > > > > If somebody can shed some light on these issues, I appreciate it. Just > let me know if you need more detailed information. > > > Best, > Manhong > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > -- Raghavendra G -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zresearch.com/pipermail/gluster-users/attachments/20081202/c52b6795/attachment.htm