Re: Potential OSD deadlock?

Dzianis Kahanovich <mahatma@xxxxxxxxxxxxxx> · Fri, 09 Oct 2015 14:50:10 +0300

Additional issues about Intel NICs: some of them (I*GB series, not e1000e) are 
multiqueue. Default qdisc - "mq", not "pfifo_fast". I have half of cluster with 
e1000e and half - IGB (every - 2x with bonding+bridge, no jumbo, txqueuelen 
2000). So, on my MQ NICs irqbalance produce massive network drops (visible by 
simple ping). Now I kill irqbalance on every node.

But somebody (I am too) replace default qdisc to something else. I use prio + 3x 
pfifo (limit 2000), all non-cluster src+dst traffic filtered to class 3. On 
single-queue (e1000e) there are 1 pfifo per NIC. On MQ - it MAY be 1 pfifo per 
NIC, but I do 1 pfifo per mq class - 8 per NIC on 8 cores.

Related to other things (SMP, NUMA, balancers, task scheduler details) this 
settings can be significant too.

PS Last detail: all e1000e: e1000e.InterruptThrottleRate=1,1

Max A. Krasilnikov пишет:
Hello!

On Thu, Oct 08, 2015 at 11:44:09PM -0600, robert wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Sage,

After trying to bisect this issue (all test moved the bisect towards
Infernalis) and eventually testing the Infernalis branch again, it
looks like the problem still exists although it is handled a tad
better in Infernalis. I'm going to test against Firefly/Giant next
week and then try and dive into the code to see if I can expose any
thing.

If I can do anything to provide you with information, please let me know.

I have fixed my troubles by setting MTU back to 1500 from 9000 in 2x10G network
between nodes (2x Cisco Nexus 5020, one link per switch, LACP, linux bounding
driver: bonding mode=4 lacp_rate=1 xmit_hash_policy=1 miimon=100, Intel 82599ES
Adapter, non-intel sfp+). When setting it to 9000 on nodes and 9216 on Nexus 5020
switch with Jumbo frames enabled i have performance drop and slow requests. When
setting 1500 on nodes and not touching Nexus all problems are fixed.

I have rebooted all my ceph services when changing MTU and changing things to
9000 and 1500 several times in order to be sure. It is reproducable in my
environment.

Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWF1QlCRDmVDuy+mK58QAAWLgP/2l+TkcpeKihDxF8h/kw
YFffNWODNfOMq8FVDQkQceo2mFCFc29JnBYiAeqW+XPelwuU5S86LG998aUB
BvIU4EHaJNJ31X1NCIA7nwi8rXlFYfSG2qQn58+IzqZoWCQM5vD/THISV1rP
qQKtoOAEuRxz+vOAJGI1A1xJSOiFwTRjs4LjE1zYjSP26LdEF61D/lb+AVzV
ufxi/ci6mAla/4VTAH4VqEviDgC8AbAZnWFGfUPcTUxJQS99kFrfjJnWvgyF
V9EmWtQCvhRO74hQLBqspOwdAxEJesPfGcJT1LjR0eEAMWvbGPtaqbSFAEWa
jjyy5wP9+4NnGLdhba6UBtLphjqTcl0e2vVwRj0zLhI14moAOlbhIKmZ1Dt+
1P6vfgOUGvO76xgDMwrVKRoQgWJO/0Tup9+oqInnNYgf4W+ZWsLgLgo7ETAF
VcI7LP1wkwAI3lz5YphY/TnKNGs6i+wVjKBamOt3R1yz9WeylaG0T6xgGHrs
VugrRSUuO+ND9+mE5EsUgITCZoaavXJESJMb30XkK6hYGB+T/q+hBafc6Wle
Jgs+aT2m1erdSyZn0ZC9a6CjWmwJXY6FCSGhE53BbefBxmCFxn+8tVav+Q8W
7s14TntP6ex4ca7eTwGuSXC9FU5fAVa+3+3aXDAC1QPAkeVkXyB716W1XG6b
BCFo
=GJL4
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Wed, Oct 7, 2015 at 1:25 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We forgot to upload the ceph.log yesterday. It is there now.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Oct 6, 2015 at 5:40 PM, Robert LeBlanc  wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I upped the debug on about everything and ran the test for about 40
minutes. I took OSD.19 on ceph1 doen and then brought it back in.
There was at least one op on osd.19 that was blocked for over 1,000
seconds. Hopefully this will have something that will cast a light on
what is going on.

We are going to upgrade this cluster to Infernalis tomorrow and rerun
the test to verify the results from the dev cluster. This cluster
matches the hardware of our production cluster but is not yet in
production so we can safely wipe it to downgrade back to Hammer.

Logs are located at http://dev.v3trae.net/~jlavoy/ceph/logs/

Let me know what else we can do to help.

Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWFFwACRDmVDuy+mK58QAAs/UP/1L+y7DEfHqD/5OpkiNQ
xuEEDm7fNJK58tLRmKsCrDrsFUvWCjiqUwboPg/E40e2GN7Lt+VkhMUEUWoo
e3L20ig04c8Zu6fE/SXX3lnvayxsWTPcMnYI+HsmIV9E/efDLVLEf6T4fvXg
5dKLiqQ8Apu+UMVfd1+aKKDdLdnYlgBCZcIV9AQe1GB8X2VJJhmNWh6TQ3Xr
gNXDexBdYjFBLu84FXOITd3ZtyUkgx/exCUMmwsJSc90jduzipS5hArvf7LN
HD6m1gBkZNbfWfc/4nzqOQnKdY1pd9jyoiQM70jn0R5b2BlZT0wLjiAJm+07
eCCQ99TZHFyeu1LyovakrYncXcnPtP5TfBFZW952FWQugupvxPCcaduz+GJV
OhPAJ9dv90qbbGCO+8kpTMAD1aHgt/7+0/hKZTg8WMHhua68SFCXmdGAmqje
IkIKswIAX4/uIoo5mK4TYB5HdEMJf9DzBFd+1RzzfRrrRalVkBfsu5ChFTx3
mu5LAMwKTslvILMxAct0JwnwkOX5Gd+OFvmBRdm16UpDaDTQT2DfykylcmJd
Cf9rPZxUv0ZHtZyTTyP2e6vgrc7UM/Ie5KonABxQ11mGtT8ysra3c9kMhYpw
D6hcAZGtdvpiBRXBC5gORfiFWFxwu5kQ+daUhgUIe/O/EWyeD0rirZoqlLnZ
EDrG
=BZVw
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Oct 6, 2015 at 2:36 PM, Robert LeBlanc  wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On my second test (a much longer one), it took nearly an hour, but a
few messages have popped up over a 20 window. Still far less than I
have been seeing.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Oct 6, 2015 at 2:00 PM, Robert LeBlanc  wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I'll capture another set of logs. Is there any other debugging you
want turned up? I've seen the same thing where I see the message
dispatched to the secondary OSD, but the message just doesn't show up
for 30+ seconds in the secondary OSD logs.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Oct 6, 2015 at 1:34 PM, Sage Weil  wrote:
On Tue, 6 Oct 2015, Robert LeBlanc wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I can't think of anything. In my dev cluster the only thing that has
changed is the Ceph versions (no reboot). What I like is even though
the disks are 100% utilized, it is preforming as I expect now. Client
I/O is slightly degraded during the recovery, but no blocked I/O when
the OSD boots or during the recovery period. This is with
max_backfills set to 20, one backfill max in our production cluster is
painful on OSD boot/recovery. I was able to reproduce this issue on
our dev cluster very easily and very quickly with these settings. So
far two tests and an hour later, only the blocked I/O when the OSD is
marked out. We would love to see that go away too, but this is far
                                             (me too!)
better than what we have now. This dev cluster also has
osd_client_message_cap set to default (100).

We need to stay on the Hammer version of Ceph and I'm willing to take
the time to bisect this. If this is not a problem in Firefly/Giant,
you you prefer a bisect to find the introduction of the problem
(Firefly/Giant -> Hammer) or the introduction of the resolution
(Hammer -> Infernalis)? Do you have some hints to reduce hitting a
commit that prevents a clean build as that is my most limiting factor?

Nothing comes to mind.  I think the best way to find this is still to see
it happen in the logs with hammer.  The frustrating thing with that log
dump you sent is that although I see plenty of slow request warnings in
the osd logs, I don't see the requests arriving.  Maybe the logs weren't
turned up for long enough?

sage

Thanks,
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Oct 6, 2015 at 12:32 PM, Sage Weil  wrote:
On Tue, 6 Oct 2015, Robert LeBlanc wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

OK, an interesting point. Running ceph version 9.0.3-2036-g4f54a0d
(4f54a0dd7c4a5c8bdc788c8b7f58048b2a28b9be) looks a lot better. I got
messages when the OSD was marked out:

2015-10-06 11:52:46.961040 osd.13 192.168.55.12:6800/20870 81 :
cluster [WRN] 17 slow requests, 3 included below; oldest blocked for >
34.476006 secs
2015-10-06 11:52:46.961056 osd.13 192.168.55.12:6800/20870 82 :
cluster [WRN] slow request 32.913474 seconds old, received at
2015-10-06 11:52:14.047475: osd_op(client.600962.0:474
rbd_data.338102ae8944a.0000000000005270 [read 3302912~4096] 8.c74a4538
ack+read+known_if_redirected e58744) currently waiting for peered
2015-10-06 11:52:46.961066 osd.13 192.168.55.12:6800/20870 83 :
cluster [WRN] slow request 32.697545 seconds old, received at
2015-10-06 11:52:14.263403: osd_op(client.600960.0:583
rbd_data.3380f74b0dc51.000000000001ee75 [read 1016832~4096] 8.778d1be3
ack+read+known_if_redirected e58744) currently waiting for peered
2015-10-06 11:52:46.961074 osd.13 192.168.55.12:6800/20870 84 :
cluster [WRN] slow request 32.668006 seconds old, received at
2015-10-06 11:52:14.292942: osd_op(client.600955.0:571
rbd_data.3380f74b0dc51.0000000000019b09 [read 1034240~4096] 8.e87a6f58
ack+read+known_if_redirected e58744) currently waiting for peered

But I'm not seeing the blocked messages when the OSD came back in. The
OSD spindles have been running at 100% during this test. I have seen
slowed I/O from the clients as expected from the extra load, but so
far no blocked messages. I'm going to run some more tests.

Good to hear.

FWIW I looked through the logs and all of the slow request no flag point
messages came from osd.163... and the logs don't show when they arrived.
My guess is this OSD has a slower disk than the others, or something else
funny is going on?

I spot checked another OSD at random (60) where I saw a slow request.  It
was stuck peering for 10s of seconds... waiting on a pg log message from
osd.163.

sage

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWFAzRCRDmVDuy+mK58QAASRYP/jrbKy5mptq/cSqJvB47
F/gEatsqU4/TwyIJg137DQTkONbHKnLgCZqsJLnCZRH8fFqtvY6g/Q/AA7Ks
ouo5gvbjKM7pOm/uUn8kU44Xe15f/bkVHvWBECZzg8YJwinPAisp5R0m1HBC
HLvsbeqV00m72TyfsZX4aj7lHdyvcdcIH2EVgX/db092VVXczK4q2gRoNr0Y
77BEr2Y/gPj5LM4b/aDG5AWY8dJZRlNz+B1CyLS+kIDXSaAbzul2UbAG6jNE
KJEVxndMPfHLIdwg55+q8VTMIjqXcCM47cQhWFrKChgVD8byJxpc6E0TqOxs
1gtNE8AILoCSYKnwQZan+TBDGxki7rQxzMdNI+NLfhy1Mwd3lSCPsDtD7W/i
tzNTr6aGz+wr+OPDQV5zrzLaPZYF3FLWN4n6RYNfnDramYzD76v+7kjdW4dE
5UVCtE7KGLCZ21fu6sln1b9q6lYXNtohAmAunIdqpo3FmHusRySyZzYKu1+9
zg/LHiArD/ddjkPxVWCTFBS17g/bESRcv2MsA30GS8J6k1zlQaLX5KeGg6Ql
WJSmW8gFfEbXj/7JTrVtQWTdgjsegaySFnDisTWUR/hEM/NuKii4xfjI32M/
luUMXHZ8lTHk9C8MfZcpyPGvwp2FliD9LqaWOVPWtWZJcerEWcZVlEApg4qb
fo5a
=ahEi
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Oct 6, 2015 at 6:37 AM, Sage Weil  wrote:
On Mon, 5 Oct 2015, Robert LeBlanc wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

With some off-list help, we have adjusted
osd_client_message_cap=10000. This seems to have helped a bit and we
have seen some OSDs have a value up to 4,000 for client messages. But
it does not solve the problem with the blocked I/O.

One thing that I have noticed is that almost exactly 30 seconds elapse
between an OSD boots and the first blocked I/O message. I don't know
if the OSD doesn't have time to get it's brain right about a PG before
it starts servicing it or what exactly.

I'm downloading the logs from yesterday now; sorry it's taking so long.

On another note, I tried upgrading our CentOS dev cluster from Hammer
to master and things didn't go so well. The OSDs would not start
because /var/lib/ceph was not owned by ceph. I chowned the directory
and all OSDs and the OSD then started, but never became active in the
cluster. It just sat there after reading all the PGs. There were
sockets open to the monitor, but no OSD to OSD sockets. I tried
downgrading to the Infernalis branch and still no luck getting the
OSDs to come up. The OSD processes were idle after the initial boot.
All packages were installed from gitbuilder.

Did you chown -R ?

         https://github.com/ceph/ceph/blob/infernalis/doc/release-notes.rst#upgrading-from-hammer

My guess is you only chowned the root dir, and the OSD didn't throw
an error when it encountered the other files?  If you can generate a debug
osd = 20 log, that would be helpful.. thanks!

sage

Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWE0F5CRDmVDuy+mK58QAAaCYQAJuFcCvRUJ46k0rYrMcc
YlrSrGwS57GJS/JjaFHsvBV7KTobEMNeMkSv4PTGpwylNV9Dx4Ad74DDqX4g
6hZDe0rE+uEI7tW9Lqp+MN7eaU2lDuwLt/pOzZI14jTskUYTlNi3HjlN67mQ
aiX1rbrJL6FFkuMOn/YqHpMbxI5ZOUZc1s7RDhASOPIs4z/CxpDfluW6fZA/
y8C+pW6zzS9U/6jZwtGhBq4dvDBO41Lxb9WOehD8Aa/Qt6XNDzGw2KEkEkw7
8dBc7UFa2Wx3Tnzy238a/nKhtz6O6OrHsroA+HGWwCoxPWjOsz/xOoOmfwp+
ALkY3id+t2uJEqzbL8/MgJ2RV1A+AZ7W1VWIJUOkDz0wR+KxQsxduHoD6rQy
zg0fj2KSAlmVusYOPM1s1+jBsqNF3wcNxpbRoVuFqk0xMgGPrIdUNdZHg6bs
D5sfkjNKexFe0ifFJ0cfv6UaGIKv4dK2eq3jUKgXHfh/qZmJbEB+zHaqJNyg
CN6w6xu1FHLeVobKAWe5ZzKY5lxw6b8YG+ce/E2dvW73gSASPTvtv68gaT04
2SPF9Ql0fERL5EDY9Pc4MHpQVcS0XxxJA69CgnWgaG6fzq2eY7fALeMBVWlB
fRj3zQwqJls/X8JZ3c4P4G0R6DP9bmMwGr++oYc3gWGrvgzxw3N7+ornd0jd
GdXC
=Aigq
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Sun, Oct 4, 2015 at 3:04 PM, Robert LeBlanc  wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I have eight nodes running the fio job rbd_test_real to different RBD
volumes. I've included the CRUSH map in the tarball.

I stopped one OSD process and marked it out. I let it recover for a
few minutes and then I started the process again and marked it in. I
started getting block I/O messages during the recovery.

The logs are located at http://162.144.87.113/files/ushou1.tar.xz

Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWEZRcCRDmVDuy+mK58QAALbEQAK5pFiixJarUdLm50zp/
3AGgGBPrieExKmoZZLCoMGfOLfxZDbN2ybtopKDQDfrTqndE/6Xi9UXqTOdW
jDc9U1wusgG0CKPsY1SMYnB9akvaDwtdh5q5k4VpN2zsG9R6lRojHeNQR3Nf
56QevJL4/e5lC3sLhVnxXXi2XKnHCVOHT+PYgNour2ZWt6OTLoFFxuSU3zLN
OtfXgrFiiNF0mrDpm0gg2l8a8N5SwP9mM233S2U/JiGAqsqoqkfd0okjDenC
ksesU/n7zordFpfLN3yjL6+X9pQ4YA6otZrq4wWtjWKO/H0b+6iIsf/AE131
R6a4Vufndpd3Ce+FNfM+iu3FmKk0KVfDAaF/tIP6S6XUzGVMAbpvpmqNL17o
boh3wPZEyK+7KiF4Qlt2KoI/FV24Yj8XiyMnKin3MbMYbammb4ER977VH7iI
sZyelNPSsYmmw/MF+AkA5KVgzQ4DAPflaejIgC5uw3dYKrn2AQE5CE9nN8Gz
GVVaGItu1Bvrz21QoT9o5v0dZ85zttFvtrKIYgSi4mdpC6XkzUbg9s9EB1/T
SEY+fau7W7TtiLpzCAIQ3zDvgsvkx2P6tKg5U8e93LVv9B+YI8i8mUxxv1j5
PHFi7KTgRUPm1FPMJDSyzvOgqyMj9AzaESl1Na6k529ILFIcyfko0niTT1oZ
3EPx
=UDIV
-----END PGP SIGNATURE-----

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Sun, Oct 4, 2015 at 7:48 AM, Sage Weil  wrote:
On Sat, 3 Oct 2015, Robert LeBlanc wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We are still struggling with this and have tried a lot of different
things. Unfortunately, Inktank (now Red Hat) no longer provides
consulting services for non-Red Hat systems. If there are some
certified Ceph consultants in the US that we can do both remote and
on-site engagements, please let us know.

This certainly seems to be network related, but somewhere in the
kernel. We have tried increasing the network and TCP buffers, number
of TCP sockets, reduced the FIN_WAIT2 state. There is about 25% idle
on the boxes, the disks are busy, but not constantly at 100% (they
cycle from <10% up to 100%, but not 100% for more than a few seconds
at a time). There seems to be no reasonable explanation why I/O is
blocked pretty frequently longer than 30 seconds. We have verified
Jumbo frames by pinging from/to each node with 9000 byte packets. The
network admins have verified that packets are not being dropped in the
switches for these nodes. We have tried different kernels including
the recent Google patch to cubic. This is showing up on three cluster
(two Ethernet and one IPoIB). I booted one cluster into Debian Jessie
(from CentOS 7.1) with similar results.

The messages seem slightly different:
2015-10-03 14:38:23.193082 osd.134 10.208.16.25:6800/1425 439 :
cluster [WRN] 14 slow requests, 1 included below; oldest blocked for >
100.087155 secs
2015-10-03 14:38:23.193090 osd.134 10.208.16.25:6800/1425 440 :
cluster [WRN] slow request 30.041999 seconds old, received at
2015-10-03 14:37:53.151014: osd_op(client.1328605.0:7082862
rbd_data.13fdcb2ae8944a.000000000001264f [read 975360~4096]
11.6d19c36f ack+read+known_if_redirected e10249) currently no flag
points reached

I don't know what "no flag points reached" means.

Just that the op hasn't been marked as reaching any interesting points
(op->mark_*() calls).

Is it possible to gather a lot with debug ms = 20 and debug osd = 20?
It's extremely verbose but it'll let us see where the op is getting
blocked.  If you see the "slow request" message it means the op in
received by ceph (that's when the clock starts), so I suspect it's not
something we can blame on the network stack.

sage

The problem is most pronounced when we have to reboot an OSD node (1
of 13), we will have hundreds of I/O blocked for some times up to 300
seconds. It takes a good 15 minutes for things to settle down. The
production cluster is very busy doing normally 8,000 I/O and peaking
at 15,000. This is all 4TB spindles with SSD journals and the disks
are between 25-50% full. We are currently splitting PGs to distribute
the load better across the disks, but we are having to do this 10 PGs
at a time as we get blocked I/O. We have max_backfills and
max_recovery set to 1, client op priority is set higher than recovery
priority. We tried increasing the number of op threads but this didn't
seem to help. It seems as soon as PGs are finished being checked, they
become active and could be the cause for slow I/O while the other PGs
are being checked.

What I don't understand is that the messages are delayed. As soon as
the message is received by Ceph OSD process, it is very quickly
committed to the journal and a response is sent back to the primary
OSD which is received very quickly as well. I've adjust
min_free_kbytes and it seems to keep the OSDs from crashing, but
doesn't solve the main problem. We don't have swap and there is 64 GB
of RAM per nodes for 10 OSDs.

Is there something that could cause the kernel to get a packet but not
be able to dispatch it to Ceph such that it could be explaining why we
are seeing these blocked I/O for 30+ seconds. Is there some pointers
to tracing Ceph messages from the network buffer through the kernel to
the Ceph process?

We can really use some pointers no matter how outrageous. We've have
over 6 people looking into this for weeks now and just can't think of
anything else.

Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWEDY1CRDmVDuy+mK58QAARgoP/RcoL1qVmg7qbQrzStar
NK80bqYGeYHb26xHbt1fZVgnZhXU0nN0Dv4ew0e/cYJLELSO2KCeXNfXN6F1
prZuzYagYEyj1Q1TOo+4h/nOQRYsTwQDdFzbHb/OUDN55C0QGZ29DjEvrqP6
K5l6sAQzvQDpUEEIiOCkS6pH59ira740nSmnYkEWhr1lxF/hMjb6fFlfCFe2
h1djM0GfY7vBHFGgI3jkw0BL5AQnWe+SCcCiKZmxY6xiR70FWl3XqK5M+nxm
iq74y7Dv6cpenit6boMr6qtOeIt+8ko85hVMh09Hkaqz/m2FzxAKLcahzkGF
Fh/M6YBzgnX7QBURTC4YQT/FVyDTW3JMuT3RKQdaX6c0iiOsVdkE+iyidWyY
Hr1KzWU23Ur9yBfZ39Y43jrsSiAEwHnKjSqMowSGljdTysNEAAZQhlqZIoHb
JlgpB39ugkHI1H5fZ5b2SIDz32/d5ywG4Gay9Rk6hp8VanvIrBbev+JYEoYT
8/WX+fhueHt4dqUYWIl3HZ0CEzbXbug0xmFvhrbmL2f3t9XOkDZRbAjlYrGm
lswiJMDueY8JkxSnPvCQrHXqjbCcy9rMG7nTnLFz98rTcHNCwtpv0qVYhheg
4YRNRVMbfNP/6xsJvG1wVOSQPwxZSPqJh42pDqMRePJl3Zn66MTx5wvdNDpk
l7OF
=OI++
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Fri, Sep 25, 2015 at 2:40 PM, Robert LeBlanc  wrote:
We dropped the replication on our cluster from 4 to 3 and it looks
like all the blocked I/O has stopped (no entries in the log for the
last 12 hours). This makes me believe that there is some issue with
the number of sockets or some other TCP issue. We have not messed with
Ephemeral ports and TIME_WAIT at this point. There are 130 OSDs, 8 KVM
hosts hosting about 150 VMs. Open files is set at 32K for the OSD
processes and 16K system wide.

Does this seem like the right spot to be looking? What are some
configuration items we should be looking at?

Thanks,
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Wed, Sep 23, 2015 at 1:30 PM, Robert LeBlanc  wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We were able to only get ~17Gb out of the XL710 (heavily tweaked)
until we went to the 4.x kernel where we got ~36Gb (no tweaking). It
seems that there were some major reworks in the network handling in
the kernel to efficiently handle that network rate. If I remember
right we also saw a drop in CPU utilization. I'm starting to think
that we did see packet loss while congesting our ISLs in our initial
testing, but we could not tell where the dropping was happening. We
saw some on the switches, but it didn't seem to be bad if we weren't
trying to congest things. We probably already saw this issue, just
didn't know it.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Wed, Sep 23, 2015 at 1:10 PM, Mark Nelson  wrote:
FWIW, we've got some 40GbE Intel cards in the community performance cluster
on a Mellanox 40GbE switch that appear (knock on wood) to be running fine
with 3.10.0-229.7.2.el7.x86_64.  We did get feedback from Intel that older
drivers might cause problems though.

Here's ifconfig from one of the nodes:

ens513f1: flags=4163  mtu 1500
         inet 10.0.10.101  netmask 255.255.255.0  broadcast 10.0.10.255
         inet6 fe80::6a05:caff:fe2b:7ea1  prefixlen 64  scopeid 0x20
         ether 68:05:ca:2b:7e:a1  txqueuelen 1000  (Ethernet)
         RX packets 169232242875  bytes 229346261232279 (208.5 TiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 153491686361  bytes 203976410836881 (185.5 TiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Mark

On 09/23/2015 01:48 PM, Robert LeBlanc wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

OK, here is the update on the saga...

I traced some more of blocked I/Os and it seems that communication
between two hosts seemed worse than others. I did a two way ping flood
between the two hosts using max packet sizes (1500). After 1.5M
packets, no lost pings. Then then had the ping flood running while I
put Ceph load on the cluster and the dropped pings started increasing
after stopping the Ceph workload the pings stopped dropping.

I then ran iperf between all the nodes with the same results, so that
ruled out Ceph to a large degree. I then booted in the the
3.10.0-229.14.1.el7.x86_64 kernel and with an hour test so far there
hasn't been any dropped pings or blocked I/O. Our 40 Gb NICs really
need the network enhancements in the 4.x series to work well.

Does this sound familiar to anyone? I'll probably start bisecting the
kernel to see where this issue in introduced. Both of the clusters
with this issue are running 4.x, other than that, they are pretty
differing hardware and network configs.

Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAvOzCRDmVDuy+mK58QAApOMP/1xmCtW++G11qcE8y/sr
RkXguqZJLc4czdOwV/tjUvhVsm5qOl4wvQCtABFZpc6t4+m5nzE3LkA1rl2l
AnARPOjh61TO6cV0CT8O0DlqtHmSd2y0ElgAUl0594eInEn7eI7crz8R543V
7I68XU5zL/vNJ9IIx38UqdhtSzXQQL664DGq3DLINK0Yb9XRVBlFip+Slt+j
cB64TuWjOPLSH09pv7SUyksodqrTq3K7p6sQkq0MOzBkFQM1FHfOipbo/LYv
F42iiQbCvFizArMu20WeOSQ4dmrXT/iecgTfEag/Zxvor2gOi/J6d2XS9ckW
byEC5/rbm4yDBua2ZugeNxQLWq0Oa7spZnx7usLsu/6YzeDNI6kmtGURajdE
/XC8bESWKveBzmGDzjff5oaMs9A1PZURYnlYADEODGAt6byoaoQEGN6dlFGe
LwQ5nOdQYuUrWpJzTJBN3aduOxursoFY8S0eR0uXm0l1CHcp22RWBDvRinok
UWk5xRBgjDCD2gIwc+wpImZbCtiTdf0vad1uLvdxGL29iFta4THzJgUGrp98
sUqM3RaTRdJYjFcNP293H7/DC0mqpnmo0Clx3jkdHX+x1EXpJUtocSeI44LX
KWIMhe9wXtKAoHQFEcJ0o0+wrXWMevvx33HPC4q1ULrFX0ILNx5Mo0Rp944X
4OEo
=P33I
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Sep 22, 2015 at 4:15 PM, Robert LeBlanc
wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

This is IPoIB and we have the MTU set to 64K. There was some issues
pinging hosts with "No buffer space available" (hosts are currently
configured for 4GB to test SSD caching rather than page cache). I
found that MTU under 32K worked reliable for ping, but still had the
blocked I/O.

I reduced the MTU to 1500 and checked pings (OK), but I'm still seeing
the blocked I/O.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Sep 22, 2015 at 3:52 PM, Sage Weil  wrote:

On Tue, 22 Sep 2015, Samuel Just wrote:

I looked at the logs, it looks like there was a 53 second delay
between when osd.17 started sending the osd_repop message and when
osd.13 started reading it, which is pretty weird.  Sage, didn't we
once see a kernel issue which caused some messages to be mysteriously
delayed for many 10s of seconds?

Every time we have seen this behavior and diagnosed it in the wild it
has
been a network misconfiguration.  Usually related to jumbo frames.

sage

What kernel are you running?
-Sam

On Tue, Sep 22, 2015 at 2:22 PM, Robert LeBlanc  wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

OK, looping in ceph-devel to see if I can get some more eyes. I've
extracted what I think are important entries from the logs for the
first blocked request. NTP is running all the servers so the logs
should be close in terms of time. Logs for 12:50 to 13:00 are
available at http://162.144.87.113/files/ceph_block_io.logs.tar.xz

2015-09-22 12:55:06.500374 - osd.17 gets I/O from client
2015-09-22 12:55:06.557160 - osd.17 submits I/O to osd.13
2015-09-22 12:55:06.557305 - osd.17 submits I/O to osd.16
2015-09-22 12:55:06.573711 - osd.16 gets I/O from osd.17
2015-09-22 12:55:06.595716 - osd.17 gets ondisk result=0 from osd.16
2015-09-22 12:55:06.640631 - osd.16 reports to osd.17 ondisk result=0
2015-09-22 12:55:36.926691 - osd.17 reports slow I/O > 30.439150 sec
2015-09-22 12:55:59.790591 - osd.13 gets I/O from osd.17
2015-09-22 12:55:59.812405 - osd.17 gets ondisk result=0 from osd.13
2015-09-22 12:56:02.941602 - osd.13 reports to osd.17 ondisk result=0

In the logs I can see that osd.17 dispatches the I/O to osd.13 and
osd.16 almost silmutaniously. osd.16 seems to get the I/O right away,
but for some reason osd.13 doesn't get the message until 53 seconds
later. osd.17 seems happy to just wait and doesn't resend the data
(well, I'm not 100% sure how to tell which entries are the actual data
transfer).

It looks like osd.17 is receiving responses to start the communication
with osd.13, but the op is not acknowledged until almost a minute
later. To me it seems that the message is getting received but not
passed to another thread right away or something. This test was done
with an idle cluster, a single fio client (rbd engine) with a single
thread.

The OSD servers are almost 100% idle during these blocked I/O
requests. I think I'm at the end of my troubleshooting, so I can use
some help.

Single Test started about
2015-09-22 12:52:36

2015-09-22 12:55:36.926680 osd.17 192.168.55.14:6800/16726 56 :
cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
30.439150 secs
2015-09-22 12:55:36.926699 osd.17 192.168.55.14:6800/16726 57 :
cluster [WRN] slow request 30.439150 seconds old, received at
2015-09-22 12:55:06.487451:
   osd_op(client.250874.0:1388 rbd_data.3380e2ae8944a.0000000000000545
[set-alloc-hint object_size 4194304 write_size 4194304,write
0~4194304] 8.bbf3e8ff ack+ondisk+write+known_if_redirected e56785)
   currently waiting for subops from 13,16
2015-09-22 12:55:36.697904 osd.16 192.168.55.13:6800/29410 7 : cluster
[WRN] 2 slow requests, 2 included below; oldest blocked for >
30.379680 secs
2015-09-22 12:55:36.697918 osd.16 192.168.55.13:6800/29410 8 : cluster
[WRN] slow request 30.291520 seconds old, received at 2015-09-22
12:55:06.406303:
   osd_op(client.250874.0:1384 rbd_data.3380e2ae8944a.0000000000000541
[set-alloc-hint object_size 4194304 write_size 4194304,write
0~4194304] 8.5fb2123f ack+ondisk+write+known_if_redirected e56785)
   currently waiting for subops from 13,17
2015-09-22 12:55:36.697927 osd.16 192.168.55.13:6800/29410 9 : cluster
[WRN] slow request 30.379680 seconds old, received at 2015-09-22
12:55:06.318144:
   osd_op(client.250874.0:1382 rbd_data.3380e2ae8944a.000000000000053f
[set-alloc-hint object_size 4194304 write_size 4194304,write
0~4194304] 8.312e69ca ack+ondisk+write+known_if_redirected e56785)
   currently waiting for subops from 13,14
2015-09-22 12:58:03.998275 osd.13 192.168.55.12:6804/4574 130 :
cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
30.954212 secs
2015-09-22 12:58:03.998286 osd.13 192.168.55.12:6804/4574 131 :
cluster [WRN] slow request 30.954212 seconds old, received at
2015-09-22 12:57:33.044003:
   osd_op(client.250874.0:1873 rbd_data.3380e2ae8944a.000000000000070d
[set-alloc-hint object_size 4194304 write_size 4194304,write
0~4194304] 8.e69870d4 ack+ondisk+write+known_if_redirected e56785)
   currently waiting for subops from 16,17
2015-09-22 12:58:03.759826 osd.16 192.168.55.13:6800/29410 10 :
cluster [WRN] 1 slow requests, 1 included below; oldest blocked for >
30.704367 secs
2015-09-22 12:58:03.759840 osd.16 192.168.55.13:6800/29410 11 :
cluster [WRN] slow request 30.704367 seconds old, received at
2015-09-22 12:57:33.055404:
   osd_op(client.250874.0:1874 rbd_data.3380e2ae8944a.000000000000070e
[set-alloc-hint object_size 4194304 write_size 4194304,write
0~4194304] 8.f7635819 ack+ondisk+write+known_if_redirected e56785)
   currently waiting for subops from 13,17

Server   IP addr              OSD
nodev  - 192.168.55.11 - 12
nodew  - 192.168.55.12 - 13
nodex  - 192.168.55.13 - 16
nodey  - 192.168.55.14 - 17
nodez  - 192.168.55.15 - 14
nodezz - 192.168.55.16 - 15

fio job:
[rbd-test]
readwrite=write
blocksize=4M
##runtime=60
name=rbd-test
##readwrite=randwrite
##bssplit=4k/85:32k/11:512/3:1m/1,4k/89:32k/10:512k/1
##rwmixread=72
##norandommap
##size=1T
##blocksize=4k
ioengine=rbd
rbdname=test2
pool=rbd
clientname=admin
iodepth=8
##numjobs=4
##thread
##group_reporting
##time_based
##direct=1
##ramp_time=60

Thanks,
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAcaKCRDmVDuy+mK58QAAPMsQAKBnS94fwuw0OqpPU3/z
tL8Z6TVRxrNigf721+2ClIu4LIH71bupDc3DgrrysQmmqGuvEMn68spmasWu
h9I/CqqgRpHqe4lUVoUEjyWA9/6Dbb6NiHSdpJ6p5jpGc8kZCvNS+ocDgFOl
903i0M0E9eEMeci5O/hrMrx1FG8SN2LS8nI261aNHMOwQK0bw8wWiCJEvqVB
sz1/+jK1BJoeIYfaT9HfUXBAvfo/W3tY/vj9KbJuZJ5AMpeYPvEHu/LAr1N7
FzzUc7a6EMlaxmSd0ML49JbV0cY9BMDjfrkKEQNKlzszlEHm3iif98QtsxbF
pPJ0hZ0G53BY3k976OWVMFm3WFRWUVOb/oiLF8H6PCm59b4LBNAg6iPNH1AI
5XhEcPpg06M03vqUaIiY9P1kQlvnn0yCXf82IUEgmg///vhxDsHWmcwClLEn
B0VszouStTzlMYnc/2vlUiI4gFVeilWLMW00VGTWV+7V1oIzIYvWHyl2QpBq
4/ZwVjQ43qLfuDTS4o+IJ4ztOMd26vIv6Mn6WVwKCjoCXJc8ajywR9Dy+6lL
o8oJ+tn7hMc9Qy1iBhu3/QIP4WCsUf9RVeu60oahNEpde89qW32S9CZlrJDO
gf4iTryRjkAhdmZIj9JiaE8jQ6dvN817D9cqs/CXKV9vhzYoM7p5YWHghBKB
J3hS
=0J7F
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, Sep 22, 2015 at 8:31 AM, Gregory Farnum  wrote:

On Tue, Sep 22, 2015 at 7:24 AM, Robert LeBlanc  wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Is there some way to tell in the logs that this is happening?

You can search for the (mangled) name _split_collection

I'm not
seeing much I/O, CPU usage during these times. Is there some way to
prevent the splitting? Is there a negative side effect to doing so?

Bump up the split and merge thresholds. You can search the list for
this, it was discussed not too long ago.

We've had I/O block for over 900 seconds and as soon as the sessions
are aborted, they are reestablished and complete immediately.

The fio test is just a seq write, starting it over (rewriting from
the
beginning) is still causing the issue. I was suspect that it is not
having to create new file and therefore split collections. This is
on
my test cluster with no other load.

Hmm, that does make it seem less likely if you're really not creating
new objects, if you're actually running fio in such a way that it's
not allocating new FS blocks (this is probably hard to set up?).

I'll be doing a lot of testing today. Which log options and depths
would be the most helpful for tracking this issue down?

If you want to go log diving "debug osd = 20", "debug filestore =
20",
"debug ms = 1" are what the OSD guys like to see. That should spit
out
everything you need to track exactly what each Op is doing.
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAdMSCRDmVDuy+mK58QAAoEgP/AqpH7i1BLpoz6fTlfWG
a6swvF8xvsyR15PDiPINYT0N7MgoikikGrMmhWpJ6utEr1XPW0MPFgzvNIsf
a1eMtNzyww4rAo6JCq6BtjmUsSKmOrBNhRNr6It9v4Nv+biqZHkiY8x/rRtV
s9z0cv3Q9Wqa6y/zKZg3H1XtbtUAx0r/DUwzSsP3omupZgNyaKkCgdkil9Vc
iyzBxFZU4+qXNT2FBG4dYDjxSHQv4psjvKR3AWXSN4yEn286KyMDjFrsDY5B
izS3h603QPoErqsUQngDE8COcaTAHHrV7gNJTikmGoNW6oQBjFq/z/zindTz
caXshVQQ+OTLo/qzJM8QPswh0TGU74SVbDkTq+eTOb5pBhQbp+42Pkkqh7jj
efyyYgDzpB1WrWRbUlWMNqmnjq7DT3lnAtuHyKbkwVs8x3JMPEiCl6PBvJbx
GnNSCqgDJrpb4fHQ2iqfQeh8Ai6AL1C1Ai19RZPrAUhpDW0/DbUvuoKSR8m7
glYYuH3hpy+oPYRhFcHm2fpNJ3u9npyk2Dai9RpzQ+mWmp3xi7becYmL482H
+WyvLeY+8AiJQDpA0CdD8KeSlOC9bw5TPmihAIn9dVTJ1O2RlapCLqL3YAJg
pGyDs8ercTEJLmvEyElj5XWh5DarsGscd2LELNS/UpyuYurbPcyPKUQ0uPjp
gcZm
=CjwB
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAv3QCRDmVDuy+mK58QAABr4QAJcQj8zjl606aMdkmQG7
S46iMXVav/Tv2os9GCUsQmMPx2u1w3/WmPfjByd6Divczfo0JLDDqrbsqre2
lq0GNK6e8fq6FXHhPpnL+t4uFV4UZ289cma3yklRqEBDXWHlP59Hu7VpxC5l
0MIcCg4wM5VM/LkrfcMven5em5CnjyFJYbActGzw9043rZoyUwCM+eL7sotl
JYHMcNWnqwdt8TLFDhUfVGiAQyV8/6E33CuCNUEuFGdtiBKzs9IZadOI8Ce0
dod2DQNyFSvomqNq6t0DuTCSA+pT8uuks2O0NcrHjoqwIWVkxQGPYlpbpckf
nxQdVM7vkqapVeQ0qUZx43Db9A5wDTC3PaEfVJZPZzWsSDjh9z7o6qHs3Kvp
krfyS+dJaZ3tOYAP1VFDfasj06sOTFu3mfGYToKA75zz5HN7QZ13Zau/qhDu
FHxsgk4oIXJsjj22LiSpoiigH5Ls+aVqtIbg8/vWp+EO6pK1fovEtJVeGAfE
tLOdxfJJLVjMCAScFG9BRl1ePPLeptivKV0v9ruWsTpn+Q96VtqAR5GQCkYE
hFrlxM+oIzHeArhhiIxSPCYLlnzxoD5IYXmTrWUYBCGvlY1mrI3j80mZ4VTj
BErsSlqnjUyFKmaI7YNKyARCloMroz3wqdy/wpg/63Io62nmh5IyY+WO8hPo
ae22
=AX+L
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWFBoOCRDmVDuy+mK58QAA7oYP/1yVPx66DovoUJiSDunA
NjIXWnKzx77aQMDwueZ0woC8PvgsX4JpLVH90Gh1MOJWyt2L4Qp+n60loSiI
Q5xU1NMYiup8YPlHqyslBxtqCPhcN1R8XhxN212R4uyVBIgjulkkEFiiQf8R
5Uq5rDy+Vqmbla3enekV9vpAJQhVdfxvhdnN9/tSC3I5JZm+6VW9PGmwvTL4
HK5UIz8luvtBWCWXYm2m7ZCUKYq0oWfdVDGEpEV473yyYwoVyvTBFuNNNbpu
kdxZ422Ztv2yj5phIQgU88Q/W5NY0awW25+16AMZNb6zCbF06hvQ9SjpydGu
6vokj3uCOImMZpdJlyMuj6IjIkB27bnJer7zVLM3tDzftPzwT8ia8M3LvMWE
sD9Dl2jx5EdFZYPMxoHF4WnD4SQtUxr+cpcI/Ij96RfXz1cMbMbVdZbWXkfz
gEY46SXuM8yMi7wzJHwd4kI9q8A+ZZDpsDuTyavMr1rqZX61H+Gzc3rNI7lc
lkJ63hfYMPCdYggnUT8mAF+cwXxq66SclwbmBYM8lbrEPuuTZzZp7veLJr5g
/PO1abPcJVYq5ZP7i1iELEac6WvDWcJgImvkF+JZAN57URNpdJA03KsVkIt7
H5n1Y8zUv7QcVMwHo/Os30vfiPmUHxg9DFbtUU8otpcf3g+udDggWHeuiZiG
6Kfk
=/gR6
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWFChuCRDmVDuy+mK58QAAfNsQAMGNu925hGNsCTuY4X7V
x71rdicFIn41I12KYtmhWl0U/V9GpUwLkOAKzeAcQiK2FgBBYRle0pANqE2K
Thf4YBJ5oEXZ72WOB14jaggiQkZwiTZLo6c69JLZADaM5NEXD/2mM77HyVLN
SP5v7FSqtnlzA53aZ7hUZn5r20VfOl/peOJGJz7C393hy3gBjr+P4LKsLE2L
QO0lNj4mJZVnVXbxqJp9Q8xn86vmfXK2sofqbAv2wjkT2C8gM9DkgLF+UJjc
mCSL9EUDFHD82BGsWzvYYFci686bIUC9IxJXKLORYKjzH3ueGHhiK3/apIi4
7DA0159nObAVNNz8AvvJnnjK94KrfcqpD3inFT7++WiNWTWbYljC7eukEM8L
QyrcMnbuomjT87I9wB9zNwa/Pt+AepdwSf7qAv1VVYrop3nJxp8bPVCzvkrr
MV/gxv3esOF68nOoQ9yt8DyHFihpg0nqSPjY3xDS7qZ05u3jnWN4rgkNxmyR
rOpwjVLUINAkVjfAM2FL2sW6wX1tKPd947CgMrAgcX0ChwZ1xYzt6xdS0p+R
gciSgw7nfCvwFmpou0DnqUdTN3K0zvM9zDhQ/b9u7JW3CEZLJXMoi99C4n3g
RfilE0rvScnx7uTI7mo94Pwy0MYFdGw04sNtFjwjIhRFPSsMUu+NSHDJe26U
JFPi
=ofgq
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWFDDOCRDmVDuy+mK58QAA0kUP/1rfRQa5Us9b/VCvKrhk
BYrde1/FBybKBVXsuXVU8Dq124A1e4L682AhmQPUeVP8PQLoqS/VFSl0h7i6
28AzydDaBTTjnrp6ZzVbtmKtm8WhmtSTFvWTlu/yJmRXAht9YozmFCByBfIY
GYvOhZzjvbxBKfwnwq97QkS7xfY2tss/BmaOvSVTX7naYaOF+HRwZMSt+BF4
9vg9BLSL3Aic0BnvdM64TWkDaHp/3gwGSmyMn8Q2Sa9CqUTddKQx2HXN6doo
gIyxCj+dIw2Pt73u2NoiYv8ZhTuS3QYM4n0rRBxj8Wr/EeNwGAOwdDSgbOxf
OvDyozzmCpQyW3h/nkdQJW5mWsJmyDIiGxHDdUn7Vgemg+Bbod0ACdoJiwct
/BIRVQe2Ee1nZQFoKBOhvaWO6+ePJR7CVfLjMkZBTzKZBjt2tfkq17G5KTdS
EsehvG/+vfFJkANL5Xh6eo9ptlHbFW8I/44pvUtGi2JwsN487l56XR9DqEKM
7Cmj9Ox205YxjqcBjhWIJQTok99lvrhDX9d7HHxIeTcmouvqPz4LTcCySRtC
xE/GcEGAAYWGPTwf9u8ULm9Rh2Z90OnKpqtCtuuWiwRRL9VU/tLlvqmHvEZM
73qhiLQZka5I72B2SAEtJnDt2sX3NJ4unvH4zWKLRFTTm4M0qk6xUL1JfqNz
JYNo
=msX2
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWFXGPCRDmVDuy+mK58QAAx38P/1sn6TA8hH+F2kd1A2Pq
IU2cg1pFcH+kw21G8VO+BavfBaBoSETHEEuMXg5SszTIcL/HyziBLJos0C0j
Vu9I0/YtblQ15enzFqKFPosdc7qij9DPJxXRkx41sJZsxvSVky+URcPpcKk6
w8Lwuq9IupesQ19ZeJkCEWFVhKz/i2E9/VXfylBgFVlkICD+5pfx6/Aq7nCP
4gboyha07zpPlDqoA7xgT+6v2zlYC80saGcA1m2XaAUdPF/17l6Mq9+Glv7E
3KeUf7jmMTJQRGBZSInFgUpPwUQKvF5OSGb3YQlzofUy5Es+wH3ccqZ+mlIY
szuBLAtN6zhFFPCs6016hiragiUhLk97PItXaKdDJKecuyRdShlJrXJmtX+j
NdM14TkBPTiLtAd/IZEEhIIpdvQH8YSl3LnEZ5gywggaY4Pk3JLFIJPgLpEb
T8hJnuiaQaYxERQ0nRoBL4LAXARseSrOuVt2EAD50Yb/5JEwB9FQlN758rb1
AE/xhpK6d53+RlkPODKxXx816hXvDP6NADaC78XGmx+A4FfepdxBijGBsmOQ
7SxAZe469K0E6EAfClc664VzwuvBEZjwTg1eK5Z6VS/FDTH/RxTKeFhlbUIT
XpezlP7XZ1/YRrJ/Eg7nb1Dv0MYQdu18tQ6QBv+C1ZsmxYLlHlcf6BZ3gNar
rZW5
=dKn9
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com