On Fri, 5 Aug 2011, Fyodor Ustinov wrote: > On 08/05/2011 04:26 AM, Sage Weil wrote: > > On Fri, 5 Aug 2011, Fyodor Ustinov wrote: > > > On 08/04/2011 10:53 PM, Sage Weil wrote: > > > > The current patches are on top of v3.0, but you should be able to rebase > > > > the readahead stuff on top of anything reasonably recent. > > > > > > > > sage > > > As usual. > > > cluster - latest 0.32 from your ubuntu rep. > > > client - latest git-pulled kernel. > > > > > > dd file from cluster to /dev/null and press ctrl-c. In syslog: > > > > > > [ 12.950114] libceph: mon0 10.5.51.230:6789 connection failed > > > [ 19.971512] libceph: client4119 fsid > > > af9be081-9777-e2cc-8988-ba02fff0f390 > > > [ 19.971845] libceph: mon0 10.5.51.230:6789 session established > > > [ 92.891202] libceph: try_read bad con->in_tag = -108 > > > [ 92.891258] libceph: osd5 10.5.51.145:6801 protocol error, garbage tag > > > [ 114.508350] libceph: try_read bad con->in_tag = 122 > > > [ 114.508406] libceph: osd1 10.5.51.141:6800 protocol error, garbage tag > > > [ 119.077246] libceph: try_read bad con->in_tag = -39 > > > [ 119.077301] libceph: osd7 10.5.51.147:6801 protocol error, garbage tag > > Hmm, this is something new. Can you confirm which commit you're running? > Well. More detailed. > > 1. Cluster: 8 physical servers with 14 osd servers (fs - xfs) + 1 physical > server with mon+mds. Ceph version - 0.32 from repository on all servers and > clients. > 2. Fresh ceph fs. (Really fresh - I made this fs from scratch) > 3. One client via cfuse slowly fills the cluster by some data (7T). Really > slowly (about 1G in minute). > > But we are talking about another client. > > Kernel for this client git pulled from > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (it's latest > kernel). This is the problem. The readahead patches in the master branch of git://ceph.newdream.net/git/ceph-client.git. They're not upstream yet. Sorry that wasn't clear! > On client ceph mounted via fstab: > > 10.5.51.230:/dcvolia/bacula /bacula ceph _netdev,rw 0 0 > > Now make show: > > root@amanda:/bacula/archive/zab.servers.dcv# cd > /bacula/archive/zab.servers.dcv > root@amanda:/bacula/archive/zab.servers.dcv# ls -alh > total 100G > drwxr-xr-x 1 bacula tape 100G 2011-07-31 00:05 . > drwxr-xr-x 1 bacula tape 253G 2011-07-18 15:21 .. > -rw-r----- 1 bacula tape 23G 2011-08-05 00:40 > zab.servers.dcv-daily-20110719-000519 > -rw-r----- 1 bacula tape 28G 2011-07-25 00:39 > zab.servers.dcv-daily-20110719-003333 > -rw-r----- 1 bacula tape 32G 2011-08-01 00:42 > zab.servers.dcv-daily-20110726-000515 > -rw-r----- 1 bacula tape 6.2G 2011-07-18 12:29 > zab.servers.dcv-monthly-20110718-111036 > -rw-r----- 1 bacula tape 6.1G 2011-07-24 01:22 > zab.servers.dcv-weekly-20110724-000518 > -rw-r----- 1 bacula tape 6.1G 2011-07-31 01:22 > zab.servers.dcv-weekly-20110731-000522 > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C34+1 records in > 34+0 records out > 285212672 bytes (285 MB) copied, 5.04607 s, 56.5 MB/s > > [24983.180068] libceph: get_reply unknown tid 6215 from osd6 This message is normal. We should probably turn down the debug level, or try to detect whether it is expected or not. > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C24+1 records in > 24+0 records out > 201326592 bytes (201 MB) copied, 2.4007 s, 83.9 MB/s > > [25035.656266] libceph: get_reply unknown tid 7025 from osd1 > > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C130+1 records in > 130+0 records out > 1090519040 bytes (1.1 GB) copied, 14.9645 s, 72.9 MB/s > > root@amanda:/bacula/archive/zab.servers.dcv# > > [25088.452033] libceph: try_read bad con->in_tag = 106 > [25088.452087] libceph: osd13 10.5.51.146:6800 protocol error, garbage tag This is not. I'll open a bug and try to track this one down. It looks new. Thanks! sage > > root@amanda:/bacula/archive/zab.servers.dcv# dd > if=zab.servers.dcv-daily-20110719-000519 of=/dev/null bs=8M > ^C104+1 records in > 104+0 records out > 872415232 bytes (872 MB) copied, 10.5863 s, 82.4 MB/s > > [25166.344264] libceph: try_read bad con->in_tag = 122 > [25166.344317] libceph: osd4 10.5.51.144:6800 protocol error, garbage tag > > and so on. > > > > Have you seen this before? > Never. > > It may be in the batch of stuff on top of > > 3.0. > > > May be. > > BTW, dramatically increase read speed I do not see. :( > > WBR, > Fyodor. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html