Re: ceph fuse closing stale session while still operable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 25, 2016 at 3:58 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> i switched now debugging to ms = 10
>
> when starting the dd i can see in the logs of osd:
>
> 2016-01-26 00:47:16.530046 7f086f404700  1 -- 10.0.0.1:6806/49658 >> :/0
> pipe(0x1f830000 sd=292 :6806 s=0 pgs=0 cs=0 l=0 c=0x1dc2e9e0).accept
> sd=292 10.0.0.91:56814/0
> 2016-01-26 00:47:16.530591 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=0 pgs=0 cs=0 l=0
> c=0x1dc2e9e0).accept peer addr is 10.0.0.91:0/1532
> 2016-01-26 00:47:16.530709 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=0 pgs=0 cs=0 l=1
> c=0x1dc2e9e0).accept of host_type 8, policy.lossy=1 policy.server=1
> policy.standby=0 policy.resetcheck=0
> 2016-01-26 00:47:16.530724 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=0 pgs=0 cs=0 l=1
> c=0x1dc2e9e0).accept my proto 24, their proto 24
> 2016-01-26 00:47:16.530864 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=0 pgs=0 cs=0 l=1
> c=0x1dc2e9e0).accept:  setting up session_security.
> 2016-01-26 00:47:16.530877 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=0 pgs=0 cs=0 l=1
> c=0x1dc2e9e0).accept new session
> 2016-01-26 00:47:16.530884 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=2 pgs=11 cs=1 l=1
> c=0x1dc2e9e0).accept success, connect_seq = 1, sending READY
> 2016-01-26 00:47:16.530889 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=2 pgs=11 cs=1 l=1
> c=0x1dc2e9e0).accept features 37154696925806591
> 2016-01-26 00:47:16.530912 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=2 pgs=11 cs=1 l=1
> c=0x1dc2e9e0).register_pipe
> 2016-01-26 00:47:16.530935 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=2 pgs=11 cs=1 l=1
> c=0x1dc2e9e0).discard_requeued_up_to 0
> 2016-01-26 00:47:16.530964 7f086f404700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=2 pgs=11 cs=1 l=1
> c=0x1dc2e9e0).accept starting writer, state open
> 2016-01-26 00:47:16.531245 7f086f303700 10 -- 10.0.0.1:6806/49658 >>
> 10.0.0.91:0/1532 pipe(0x1f830000 sd=292 :6806 s=2 pgs=11 cs=1 l=1
> c=0x1dc2e9e0).writer: state = open policy.server=1

Well, this is a normal session setup. If this is repeating or
something that might be interesting, but with this snippet I can't
tell. Similarly, the every-15-minutes socket replacement you've got
below is not interesting — that's a normal timeout/reset on idle
connections.

You can look through the list archives for help on identifying why OSD
operations aren't working. Apparently you've got enough security cap
permissions, so it could be a network issue blocking messages of a
certain size. It could be some odd configuration issue I've not run
into. It could be some odd firewall behavior — especially if you're
really seeing that ceph-fuse works properly once you've already
established a mount on kernel rbd or something.. There's lots of
things that could have gone wrong, but it's not ceph-fuse itself and
given what you're showing me here, most of them aren't in any part of
Ceph. *shrug*
-Greg


> And thats it, nothing more moves. And the dd is stuck.
>
> [root@cn201 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
> objecter_requests
> {
>     "ops": [
>         {
>             "tid": 13,
>             "pg": "6.ec7cd164",
>             "osd": 1,
>             "object_id": "10000000bdf.00000008",
>             "object_locator": "@6",
>             "target_object_id": "10000000bdf.00000008",
>             "target_object_locator": "@6",
>             "paused": 0,
>             "used_replica": 0,
>             "precalc_pgid": 0,
>             "last_sent": "2016-01-26 00:47:28.783581",
>             "attempts": 1,
>             "snapid": "head",
>             "snap_context": "1=[]",
>             "mtime": "2016-01-26 00:47:28.769145",
>             "osd_ops": [
>                 "write 0~4194304"
>             ]
>         },
>         {
>             "tid": 23,
>             "pg": "6.1d0c5182",
>             "osd": 1,
>             "object_id": "10000000bdf.00000012",
>             "object_locator": "@6",
>             "target_object_id": "10000000bdf.00000012",
>             "target_object_locator": "@6",
>             "paused": 0,
>             "used_replica": 0,
>             "precalc_pgid": 0,
>             "last_sent": "2016-01-26 00:47:28.917678",
>             "attempts": 1,
>             "snapid": "head",
>             "snap_context": "1=[]",
>             "mtime": "2016-01-26 00:47:28.904582",
>             "osd_ops": [
>                 "write 0~4194304"
>             ]
>         },
>         {
>             "tid": 25,
>             "pg": "6.97271370",
>             "osd": 2,
>             "object_id": "10000000bdf.00000014",
>             "object_locator": "@6",
>             "target_object_id": "10000000bdf.00000014",
>             "target_object_locator": "@6",
>             "paused": 0,
>             "used_replica": 0,
>             "precalc_pgid": 0,
>             "last_sent": "2016-01-26 00:47:28.943763",
>             "attempts": 1,
>             "snapid": "head",
>             "snap_context": "1=[]",
>             "mtime": "2016-01-26 00:47:28.929857",
>             "osd_ops": [
>                 "write 0~4194304"
>             ]
>         },
>
> [....]
>
>
> On another osd's i can see:
>
> 2016-01-26 00:11:36.224232 7f9c18596700  0 -- 10.0.0.1:6804/51935 >>
> 10.0.0.91:0/1536 pipe(0x1f955800 sd=276 :6804 s=0 pgs=0 cs=0 l=1
> c=0x20379080).accept replacing existing (lossy) channel (new one lossy=1)
> 2016-01-26 00:26:36.312611 7f9c1a2b3700  0 -- 10.0.0.1:6804/51935 >>
> 10.0.0.91:0/1536 pipe(0x1f792000 sd=246 :6804 s=0 pgs=0 cs=0 l=1
> c=0x155d02c0).accept replacing existing (lossy) channel (new one lossy=1)
> 2016-01-26 00:41:36.403063 7f9c1795c700  0 -- 10.0.0.1:6804/51935 >>
> 10.0.0.91:0/1536 pipe(0x20efa000 sd=254 :6804 s=0 pgs=0 cs=0 l=1
> c=0x21bc1760).accept replacing existing (lossy) channel (new one lossy=1)
>
>
> So for me, this is some kind of bug.
>
> Without loaded rbd/ceph kernel module ceph-fuse will not work.
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:info@xxxxxxxxxxxxxxxxx
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux