Hello! On Thu, Oct 08, 2015 at 11:44:09PM -0600, robert wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > Sage, > After trying to bisect this issue (all test moved the bisect towards > Infernalis) and eventually testing the Infernalis branch again, it > looks like the problem still exists although it is handled a tad > better in Infernalis. I'm going to test against Firefly/Giant next > week and then try and dive into the code to see if I can expose any > thing. > If I can do anything to provide you with information, please let me know. I have fixed my troubles by setting MTU back to 1500 from 9000 in 2x10G network between nodes (2x Cisco Nexus 5020, one link per switch, LACP, linux bounding driver: bonding mode=4 lacp_rate=1 xmit_hash_policy=1 miimon=100, Intel 82599ES Adapter, non-intel sfp+). When setting it to 9000 on nodes and 9216 on Nexus 5020 switch with Jumbo frames enabled i have performance drop and slow requests. When setting 1500 on nodes and not touching Nexus all problems are fixed. I have rebooted all my ceph services when changing MTU and changing things to 9000 and 1500 several times in order to be sure. It is reproducable in my environment. > Thanks, > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.2.0 > Comment: https://www.mailvelope.com > wsFcBAEBCAAQBQJWF1QlCRDmVDuy+mK58QAAWLgP/2l+TkcpeKihDxF8h/kw > YFffNWODNfOMq8FVDQkQceo2mFCFc29JnBYiAeqW+XPelwuU5S86LG998aUB > BvIU4EHaJNJ31X1NCIA7nwi8rXlFYfSG2qQn58+IzqZoWCQM5vD/THISV1rP > qQKtoOAEuRxz+vOAJGI1A1xJSOiFwTRjs4LjE1zYjSP26LdEF61D/lb+AVzV > ufxi/ci6mAla/4VTAH4VqEviDgC8AbAZnWFGfUPcTUxJQS99kFrfjJnWvgyF > V9EmWtQCvhRO74hQLBqspOwdAxEJesPfGcJT1LjR0eEAMWvbGPtaqbSFAEWa > jjyy5wP9+4NnGLdhba6UBtLphjqTcl0e2vVwRj0zLhI14moAOlbhIKmZ1Dt+ > 1P6vfgOUGvO76xgDMwrVKRoQgWJO/0Tup9+oqInnNYgf4W+ZWsLgLgo7ETAF > VcI7LP1wkwAI3lz5YphY/TnKNGs6i+wVjKBamOt3R1yz9WeylaG0T6xgGHrs > VugrRSUuO+ND9+mE5EsUgITCZoaavXJESJMb30XkK6hYGB+T/q+hBafc6Wle > Jgs+aT2m1erdSyZn0ZC9a6CjWmwJXY6FCSGhE53BbefBxmCFxn+8tVav+Q8W > 7s14TntP6ex4ca7eTwGuSXC9FU5fAVa+3+3aXDAC1QPAkeVkXyB716W1XG6b > BCFo > =GJL4 > -----END PGP SIGNATURE----- > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > On Wed, Oct 7, 2015 at 1:25 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> We forgot to upload the ceph.log yesterday. It is there now. >> - ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> >> >> On Tue, Oct 6, 2015 at 5:40 PM, Robert LeBlanc wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA256 >>> >>> I upped the debug on about everything and ran the test for about 40 >>> minutes. I took OSD.19 on ceph1 doen and then brought it back in. >>> There was at least one op on osd.19 that was blocked for over 1,000 >>> seconds. Hopefully this will have something that will cast a light on >>> what is going on. >>> >>> We are going to upgrade this cluster to Infernalis tomorrow and rerun >>> the test to verify the results from the dev cluster. This cluster >>> matches the hardware of our production cluster but is not yet in >>> production so we can safely wipe it to downgrade back to Hammer. >>> >>> Logs are located at http://dev.v3trae.net/~jlavoy/ceph/logs/ >>> >>> Let me know what else we can do to help. >>> >>> Thanks, >>> -----BEGIN PGP SIGNATURE----- >>> Version: Mailvelope v1.2.0 >>> Comment: https://www.mailvelope.com >>> >>> wsFcBAEBCAAQBQJWFFwACRDmVDuy+mK58QAAs/UP/1L+y7DEfHqD/5OpkiNQ >>> xuEEDm7fNJK58tLRmKsCrDrsFUvWCjiqUwboPg/E40e2GN7Lt+VkhMUEUWoo >>> e3L20ig04c8Zu6fE/SXX3lnvayxsWTPcMnYI+HsmIV9E/efDLVLEf6T4fvXg >>> 5dKLiqQ8Apu+UMVfd1+aKKDdLdnYlgBCZcIV9AQe1GB8X2VJJhmNWh6TQ3Xr >>> gNXDexBdYjFBLu84FXOITd3ZtyUkgx/exCUMmwsJSc90jduzipS5hArvf7LN >>> HD6m1gBkZNbfWfc/4nzqOQnKdY1pd9jyoiQM70jn0R5b2BlZT0wLjiAJm+07 >>> eCCQ99TZHFyeu1LyovakrYncXcnPtP5TfBFZW952FWQugupvxPCcaduz+GJV >>> OhPAJ9dv90qbbGCO+8kpTMAD1aHgt/7+0/hKZTg8WMHhua68SFCXmdGAmqje >>> IkIKswIAX4/uIoo5mK4TYB5HdEMJf9DzBFd+1RzzfRrrRalVkBfsu5ChFTx3 >>> mu5LAMwKTslvILMxAct0JwnwkOX5Gd+OFvmBRdm16UpDaDTQT2DfykylcmJd >>> Cf9rPZxUv0ZHtZyTTyP2e6vgrc7UM/Ie5KonABxQ11mGtT8ysra3c9kMhYpw >>> D6hcAZGtdvpiBRXBC5gORfiFWFxwu5kQ+daUhgUIe/O/EWyeD0rirZoqlLnZ >>> EDrG >>> =BZVw >>> -----END PGP SIGNATURE----- >>> ---------------- >>> Robert LeBlanc >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>> >>> >>> On Tue, Oct 6, 2015 at 2:36 PM, Robert LeBlanc wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA256 >>>> >>>> On my second test (a much longer one), it took nearly an hour, but a >>>> few messages have popped up over a 20 window. Still far less than I >>>> have been seeing. >>>> - ---------------- >>>> Robert LeBlanc >>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>> >>>> >>>> On Tue, Oct 6, 2015 at 2:00 PM, Robert LeBlanc wrote: >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA256 >>>>> >>>>> I'll capture another set of logs. Is there any other debugging you >>>>> want turned up? I've seen the same thing where I see the message >>>>> dispatched to the secondary OSD, but the message just doesn't show up >>>>> for 30+ seconds in the secondary OSD logs. >>>>> - ---------------- >>>>> Robert LeBlanc >>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>> >>>>> >>>>> On Tue, Oct 6, 2015 at 1:34 PM, Sage Weil wrote: >>>>>> On Tue, 6 Oct 2015, Robert LeBlanc wrote: >>>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>>> Hash: SHA256 >>>>>>> >>>>>>> I can't think of anything. In my dev cluster the only thing that has >>>>>>> changed is the Ceph versions (no reboot). What I like is even though >>>>>>> the disks are 100% utilized, it is preforming as I expect now. Client >>>>>>> I/O is slightly degraded during the recovery, but no blocked I/O when >>>>>>> the OSD boots or during the recovery period. This is with >>>>>>> max_backfills set to 20, one backfill max in our production cluster is >>>>>>> painful on OSD boot/recovery. I was able to reproduce this issue on >>>>>>> our dev cluster very easily and very quickly with these settings. So >>>>>>> far two tests and an hour later, only the blocked I/O when the OSD is >>>>>>> marked out. We would love to see that go away too, but this is far >>>>>> (me too!) >>>>>>> better than what we have now. This dev cluster also has >>>>>>> osd_client_message_cap set to default (100). >>>>>>> >>>>>>> We need to stay on the Hammer version of Ceph and I'm willing to take >>>>>>> the time to bisect this. If this is not a problem in Firefly/Giant, >>>>>>> you you prefer a bisect to find the introduction of the problem >>>>>>> (Firefly/Giant -> Hammer) or the introduction of the resolution >>>>>>> (Hammer -> Infernalis)? Do you have some hints to reduce hitting a >>>>>>> commit that prevents a clean build as that is my most limiting factor? >>>>>> >>>>>> Nothing comes to mind. I think the best way to find this is still to see >>>>>> it happen in the logs with hammer. The frustrating thing with that log >>>>>> dump you sent is that although I see plenty of slow request warnings in >>>>>> the osd logs, I don't see the requests arriving. Maybe the logs weren't >>>>>> turned up for long enough? >>>>>> >>>>>> sage >>>>>> >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> - ---------------- >>>>>>> Robert LeBlanc >>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 6, 2015 at 12:32 PM, Sage Weil wrote: >>>>>> >> On Tue, 6 Oct 2015, Robert LeBlanc wrote: >>>>>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >>> Hash: SHA256 >>>>>> >>> >>>>>> >>> OK, an interesting point. Running ceph version 9.0.3-2036-g4f54a0d >>>>>> >>> (4f54a0dd7c4a5c8bdc788c8b7f58048b2a28b9be) looks a lot better. I got >>>>>> >>> messages when the OSD was marked out: >>>>>> >>> >>>>>> >>> 2015-10-06 11:52:46.961040 osd.13 192.168.55.12:6800/20870 81 : >>>>>> >>> cluster [WRN] 17 slow requests, 3 included below; oldest blocked for > >>>>>> >>> 34.476006 secs >>>>>> >>> 2015-10-06 11:52:46.961056 osd.13 192.168.55.12:6800/20870 82 : >>>>>> >>> cluster [WRN] slow request 32.913474 seconds old, received at >>>>>> >>> 2015-10-06 11:52:14.047475: osd_op(client.600962.0:474 >>>>>> >>> rbd_data.338102ae8944a.0000000000005270 [read 3302912~4096] 8.c74a4538 >>>>>> >>> ack+read+known_if_redirected e58744) currently waiting for peered >>>>>> >>> 2015-10-06 11:52:46.961066 osd.13 192.168.55.12:6800/20870 83 : >>>>>> >>> cluster [WRN] slow request 32.697545 seconds old, received at >>>>>> >>> 2015-10-06 11:52:14.263403: osd_op(client.600960.0:583 >>>>>> >>> rbd_data.3380f74b0dc51.000000000001ee75 [read 1016832~4096] 8.778d1be3 >>>>>> >>> ack+read+known_if_redirected e58744) currently waiting for peered >>>>>> >>> 2015-10-06 11:52:46.961074 osd.13 192.168.55.12:6800/20870 84 : >>>>>> >>> cluster [WRN] slow request 32.668006 seconds old, received at >>>>>> >>> 2015-10-06 11:52:14.292942: osd_op(client.600955.0:571 >>>>>> >>> rbd_data.3380f74b0dc51.0000000000019b09 [read 1034240~4096] 8.e87a6f58 >>>>>> >>> ack+read+known_if_redirected e58744) currently waiting for peered >>>>>> >>> >>>>>> >>> But I'm not seeing the blocked messages when the OSD came back in. The >>>>>> >>> OSD spindles have been running at 100% during this test. I have seen >>>>>> >>> slowed I/O from the clients as expected from the extra load, but so >>>>>> >>> far no blocked messages. I'm going to run some more tests. >>>>>> >> >>>>>> >> Good to hear. >>>>>> >> >>>>>> >> FWIW I looked through the logs and all of the slow request no flag point >>>>>> >> messages came from osd.163... and the logs don't show when they arrived. >>>>>> >> My guess is this OSD has a slower disk than the others, or something else >>>>>> >> funny is going on? >>>>>> >> >>>>>> >> I spot checked another OSD at random (60) where I saw a slow request. It >>>>>> >> was stuck peering for 10s of seconds... waiting on a pg log message from >>>>>> >> osd.163. >>>>>> >> >>>>>> >> sage >>>>>> >> >>>>>> >> >>>>>> >>> >>>>>> >>> -----BEGIN PGP SIGNATURE----- >>>>>> >>> Version: Mailvelope v1.2.0 >>>>>> >>> Comment: https://www.mailvelope.com >>>>>> >>> >>>>>> >>> wsFcBAEBCAAQBQJWFAzRCRDmVDuy+mK58QAASRYP/jrbKy5mptq/cSqJvB47 >>>>>> >>> F/gEatsqU4/TwyIJg137DQTkONbHKnLgCZqsJLnCZRH8fFqtvY6g/Q/AA7Ks >>>>>> >>> ouo5gvbjKM7pOm/uUn8kU44Xe15f/bkVHvWBECZzg8YJwinPAisp5R0m1HBC >>>>>> >>> HLvsbeqV00m72TyfsZX4aj7lHdyvcdcIH2EVgX/db092VVXczK4q2gRoNr0Y >>>>>> >>> 77BEr2Y/gPj5LM4b/aDG5AWY8dJZRlNz+B1CyLS+kIDXSaAbzul2UbAG6jNE >>>>>> >>> KJEVxndMPfHLIdwg55+q8VTMIjqXcCM47cQhWFrKChgVD8byJxpc6E0TqOxs >>>>>> >>> 1gtNE8AILoCSYKnwQZan+TBDGxki7rQxzMdNI+NLfhy1Mwd3lSCPsDtD7W/i >>>>>> >>> tzNTr6aGz+wr+OPDQV5zrzLaPZYF3FLWN4n6RYNfnDramYzD76v+7kjdW4dE >>>>>> >>> 5UVCtE7KGLCZ21fu6sln1b9q6lYXNtohAmAunIdqpo3FmHusRySyZzYKu1+9 >>>>>> >>> zg/LHiArD/ddjkPxVWCTFBS17g/bESRcv2MsA30GS8J6k1zlQaLX5KeGg6Ql >>>>>> >>> WJSmW8gFfEbXj/7JTrVtQWTdgjsegaySFnDisTWUR/hEM/NuKii4xfjI32M/ >>>>>> >>> luUMXHZ8lTHk9C8MfZcpyPGvwp2FliD9LqaWOVPWtWZJcerEWcZVlEApg4qb >>>>>> >>> fo5a >>>>>> >>> =ahEi >>>>>> >>> -----END PGP SIGNATURE----- >>>>>> >>> ---------------- >>>>>> >>> Robert LeBlanc >>>>>> >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >>> >>>>>> >>> >>>>>> >>> On Tue, Oct 6, 2015 at 6:37 AM, Sage Weil wrote: >>>>>> >> >> On Mon, 5 Oct 2015, Robert LeBlanc wrote: >>>>>> >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >>> Hash: SHA256 >>>>>> >> >>> >>>>>> >> >>> With some off-list help, we have adjusted >>>>>> >> >>> osd_client_message_cap=10000. This seems to have helped a bit and we >>>>>> >> >>> have seen some OSDs have a value up to 4,000 for client messages. But >>>>>> >> >>> it does not solve the problem with the blocked I/O. >>>>>> >> >>> >>>>>> >> >>> One thing that I have noticed is that almost exactly 30 seconds elapse >>>>>> >> >>> between an OSD boots and the first blocked I/O message. I don't know >>>>>> >> >>> if the OSD doesn't have time to get it's brain right about a PG before >>>>>> >> >>> it starts servicing it or what exactly. >>>>>> >> >> >>>>>> >> >> I'm downloading the logs from yesterday now; sorry it's taking so long. >>>>>> >> >> >>>>>> >> >>> On another note, I tried upgrading our CentOS dev cluster from Hammer >>>>>> >> >>> to master and things didn't go so well. The OSDs would not start >>>>>> >> >>> because /var/lib/ceph was not owned by ceph. I chowned the directory >>>>>> >> >>> and all OSDs and the OSD then started, but never became active in the >>>>>> >> >>> cluster. It just sat there after reading all the PGs. There were >>>>>> >> >>> sockets open to the monitor, but no OSD to OSD sockets. I tried >>>>>> >> >>> downgrading to the Infernalis branch and still no luck getting the >>>>>> >> >>> OSDs to come up. The OSD processes were idle after the initial boot. >>>>>> >> >>> All packages were installed from gitbuilder. >>>>>> >> >> >>>>>> >> >> Did you chown -R ? >>>>>> >> >> >>>>>> >> >> https://github.com/ceph/ceph/blob/infernalis/doc/release-notes.rst#upgrading-from-hammer >>>>>> >> >> >>>>>> >> >> My guess is you only chowned the root dir, and the OSD didn't throw >>>>>> >> >> an error when it encountered the other files? If you can generate a debug >>>>>> >> >> osd = 20 log, that would be helpful.. thanks! >>>>>> >> >> >>>>>> >> >> sage >>>>>> >> >> >>>>>> >> >> >>>>>> >> >>> >>>>>> >> >>> Thanks, >>>>>> >> >>> -----BEGIN PGP SIGNATURE----- >>>>>> >> >>> Version: Mailvelope v1.2.0 >>>>>> >> >>> Comment: https://www.mailvelope.com >>>>>> >> >>> >>>>>> >> >>> wsFcBAEBCAAQBQJWE0F5CRDmVDuy+mK58QAAaCYQAJuFcCvRUJ46k0rYrMcc >>>>>> >> >>> YlrSrGwS57GJS/JjaFHsvBV7KTobEMNeMkSv4PTGpwylNV9Dx4Ad74DDqX4g >>>>>> >> >>> 6hZDe0rE+uEI7tW9Lqp+MN7eaU2lDuwLt/pOzZI14jTskUYTlNi3HjlN67mQ >>>>>> >> >>> aiX1rbrJL6FFkuMOn/YqHpMbxI5ZOUZc1s7RDhASOPIs4z/CxpDfluW6fZA/ >>>>>> >> >>> y8C+pW6zzS9U/6jZwtGhBq4dvDBO41Lxb9WOehD8Aa/Qt6XNDzGw2KEkEkw7 >>>>>> >> >>> 8dBc7UFa2Wx3Tnzy238a/nKhtz6O6OrHsroA+HGWwCoxPWjOsz/xOoOmfwp+ >>>>>> >> >>> ALkY3id+t2uJEqzbL8/MgJ2RV1A+AZ7W1VWIJUOkDz0wR+KxQsxduHoD6rQy >>>>>> >> >>> zg0fj2KSAlmVusYOPM1s1+jBsqNF3wcNxpbRoVuFqk0xMgGPrIdUNdZHg6bs >>>>>> >> >>> D5sfkjNKexFe0ifFJ0cfv6UaGIKv4dK2eq3jUKgXHfh/qZmJbEB+zHaqJNyg >>>>>> >> >>> CN6w6xu1FHLeVobKAWe5ZzKY5lxw6b8YG+ce/E2dvW73gSASPTvtv68gaT04 >>>>>> >> >>> 2SPF9Ql0fERL5EDY9Pc4MHpQVcS0XxxJA69CgnWgaG6fzq2eY7fALeMBVWlB >>>>>> >> >>> fRj3zQwqJls/X8JZ3c4P4G0R6DP9bmMwGr++oYc3gWGrvgzxw3N7+ornd0jd >>>>>> >> >>> GdXC >>>>>> >> >>> =Aigq >>>>>> >> >>> -----END PGP SIGNATURE----- >>>>>> >> >>> ---------------- >>>>>> >> >>> Robert LeBlanc >>>>>> >> >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> On Sun, Oct 4, 2015 at 3:04 PM, Robert LeBlanc wrote: >>>>>> >> >> >> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >> >> Hash: SHA256 >>>>>> >> >> >> >>>>>> >> >> >> I have eight nodes running the fio job rbd_test_real to different RBD >>>>>> >> >> >> volumes. I've included the CRUSH map in the tarball. >>>>>> >> >> >> >>>>>> >> >> >> I stopped one OSD process and marked it out. I let it recover for a >>>>>> >> >> >> few minutes and then I started the process again and marked it in. I >>>>>> >> >> >> started getting block I/O messages during the recovery. >>>>>> >> >> >> >>>>>> >> >> >> The logs are located at http://162.144.87.113/files/ushou1.tar.xz >>>>>> >> >> >> >>>>>> >> >> >> Thanks, >>>>>> >> >> >> -----BEGIN PGP SIGNATURE----- >>>>>> >> >> >> Version: Mailvelope v1.2.0 >>>>>> >> >> >> Comment: https://www.mailvelope.com >>>>>> >> >> >> >>>>>> >> >> >> wsFcBAEBCAAQBQJWEZRcCRDmVDuy+mK58QAALbEQAK5pFiixJarUdLm50zp/ >>>>>> >> >> >> 3AGgGBPrieExKmoZZLCoMGfOLfxZDbN2ybtopKDQDfrTqndE/6Xi9UXqTOdW >>>>>> >> >> >> jDc9U1wusgG0CKPsY1SMYnB9akvaDwtdh5q5k4VpN2zsG9R6lRojHeNQR3Nf >>>>>> >> >> >> 56QevJL4/e5lC3sLhVnxXXi2XKnHCVOHT+PYgNour2ZWt6OTLoFFxuSU3zLN >>>>>> >> >> >> OtfXgrFiiNF0mrDpm0gg2l8a8N5SwP9mM233S2U/JiGAqsqoqkfd0okjDenC >>>>>> >> >> >> ksesU/n7zordFpfLN3yjL6+X9pQ4YA6otZrq4wWtjWKO/H0b+6iIsf/AE131 >>>>>> >> >> >> R6a4Vufndpd3Ce+FNfM+iu3FmKk0KVfDAaF/tIP6S6XUzGVMAbpvpmqNL17o >>>>>> >> >> >> boh3wPZEyK+7KiF4Qlt2KoI/FV24Yj8XiyMnKin3MbMYbammb4ER977VH7iI >>>>>> >> >> >> sZyelNPSsYmmw/MF+AkA5KVgzQ4DAPflaejIgC5uw3dYKrn2AQE5CE9nN8Gz >>>>>> >> >> >> GVVaGItu1Bvrz21QoT9o5v0dZ85zttFvtrKIYgSi4mdpC6XkzUbg9s9EB1/T >>>>>> >> >> >> SEY+fau7W7TtiLpzCAIQ3zDvgsvkx2P6tKg5U8e93LVv9B+YI8i8mUxxv1j5 >>>>>> >> >> >> PHFi7KTgRUPm1FPMJDSyzvOgqyMj9AzaESl1Na6k529ILFIcyfko0niTT1oZ >>>>>> >> >> >> 3EPx >>>>>> >> >> >> =UDIV >>>>>> >> >> >> -----END PGP SIGNATURE----- >>>>>> >> >> >> >>>>>> >> >> >> ---------------- >>>>>> >> >> >> Robert LeBlanc >>>>>> >> >> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> On Sun, Oct 4, 2015 at 7:48 AM, Sage Weil wrote: >>>>>> >> >> >>> On Sat, 3 Oct 2015, Robert LeBlanc wrote: >>>>>> >> >> >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >> >>>> Hash: SHA256 >>>>>> >> >> >>>> >>>>>> >> >> >>>> We are still struggling with this and have tried a lot of different >>>>>> >> >> >>>> things. Unfortunately, Inktank (now Red Hat) no longer provides >>>>>> >> >> >>>> consulting services for non-Red Hat systems. If there are some >>>>>> >> >> >>>> certified Ceph consultants in the US that we can do both remote and >>>>>> >> >> >>>> on-site engagements, please let us know. >>>>>> >> >> >>>> >>>>>> >> >> >>>> This certainly seems to be network related, but somewhere in the >>>>>> >> >> >>>> kernel. We have tried increasing the network and TCP buffers, number >>>>>> >> >> >>>> of TCP sockets, reduced the FIN_WAIT2 state. There is about 25% idle >>>>>> >> >> >>>> on the boxes, the disks are busy, but not constantly at 100% (they >>>>>> >> >> >>>> cycle from <10% up to 100%, but not 100% for more than a few seconds >>>>>> >> >> >>>> at a time). There seems to be no reasonable explanation why I/O is >>>>>> >> >> >>>> blocked pretty frequently longer than 30 seconds. We have verified >>>>>> >> >> >>>> Jumbo frames by pinging from/to each node with 9000 byte packets. The >>>>>> >> >> >>>> network admins have verified that packets are not being dropped in the >>>>>> >> >> >>>> switches for these nodes. We have tried different kernels including >>>>>> >> >> >>>> the recent Google patch to cubic. This is showing up on three cluster >>>>>> >> >> >>>> (two Ethernet and one IPoIB). I booted one cluster into Debian Jessie >>>>>> >> >> >>>> (from CentOS 7.1) with similar results. >>>>>> >> >> >>>> >>>>>> >> >> >>>> The messages seem slightly different: >>>>>> >> >> >>>> 2015-10-03 14:38:23.193082 osd.134 10.208.16.25:6800/1425 439 : >>>>>> >> >> >>>> cluster [WRN] 14 slow requests, 1 included below; oldest blocked for > >>>>>> >> >> >>>> 100.087155 secs >>>>>> >> >> >>>> 2015-10-03 14:38:23.193090 osd.134 10.208.16.25:6800/1425 440 : >>>>>> >> >> >>>> cluster [WRN] slow request 30.041999 seconds old, received at >>>>>> >> >> >>>> 2015-10-03 14:37:53.151014: osd_op(client.1328605.0:7082862 >>>>>> >> >> >>>> rbd_data.13fdcb2ae8944a.000000000001264f [read 975360~4096] >>>>>> >> >> >>>> 11.6d19c36f ack+read+known_if_redirected e10249) currently no flag >>>>>> >> >> >>>> points reached >>>>>> >> >> >>>> >>>>>> >> >> >>>> I don't know what "no flag points reached" means. >>>>>> >> >> >>> >>>>>> >> >> >>> Just that the op hasn't been marked as reaching any interesting points >>>>>> >> >> >>> (op->mark_*() calls). >>>>>> >> >> >>> >>>>>> >> >> >>> Is it possible to gather a lot with debug ms = 20 and debug osd = 20? >>>>>> >> >> >>> It's extremely verbose but it'll let us see where the op is getting >>>>>> >> >> >>> blocked. If you see the "slow request" message it means the op in >>>>>> >> >> >>> received by ceph (that's when the clock starts), so I suspect it's not >>>>>> >> >> >>> something we can blame on the network stack. >>>>>> >> >> >>> >>>>>> >> >> >>> sage >>>>>> >> >> >>> >>>>>> >> >> >>> >>>>>> >> >> >>>> >>>>>> >> >> >>>> The problem is most pronounced when we have to reboot an OSD node (1 >>>>>> >> >> >>>> of 13), we will have hundreds of I/O blocked for some times up to 300 >>>>>> >> >> >>>> seconds. It takes a good 15 minutes for things to settle down. The >>>>>> >> >> >>>> production cluster is very busy doing normally 8,000 I/O and peaking >>>>>> >> >> >>>> at 15,000. This is all 4TB spindles with SSD journals and the disks >>>>>> >> >> >>>> are between 25-50% full. We are currently splitting PGs to distribute >>>>>> >> >> >>>> the load better across the disks, but we are having to do this 10 PGs >>>>>> >> >> >>>> at a time as we get blocked I/O. We have max_backfills and >>>>>> >> >> >>>> max_recovery set to 1, client op priority is set higher than recovery >>>>>> >> >> >>>> priority. We tried increasing the number of op threads but this didn't >>>>>> >> >> >>>> seem to help. It seems as soon as PGs are finished being checked, they >>>>>> >> >> >>>> become active and could be the cause for slow I/O while the other PGs >>>>>> >> >> >>>> are being checked. >>>>>> >> >> >>>> >>>>>> >> >> >>>> What I don't understand is that the messages are delayed. As soon as >>>>>> >> >> >>>> the message is received by Ceph OSD process, it is very quickly >>>>>> >> >> >>>> committed to the journal and a response is sent back to the primary >>>>>> >> >> >>>> OSD which is received very quickly as well. I've adjust >>>>>> >> >> >>>> min_free_kbytes and it seems to keep the OSDs from crashing, but >>>>>> >> >> >>>> doesn't solve the main problem. We don't have swap and there is 64 GB >>>>>> >> >> >>>> of RAM per nodes for 10 OSDs. >>>>>> >> >> >>>> >>>>>> >> >> >>>> Is there something that could cause the kernel to get a packet but not >>>>>> >> >> >>>> be able to dispatch it to Ceph such that it could be explaining why we >>>>>> >> >> >>>> are seeing these blocked I/O for 30+ seconds. Is there some pointers >>>>>> >> >> >>>> to tracing Ceph messages from the network buffer through the kernel to >>>>>> >> >> >>>> the Ceph process? >>>>>> >> >> >>>> >>>>>> >> >> >>>> We can really use some pointers no matter how outrageous. We've have >>>>>> >> >> >>>> over 6 people looking into this for weeks now and just can't think of >>>>>> >> >> >>>> anything else. >>>>>> >> >> >>>> >>>>>> >> >> >>>> Thanks, >>>>>> >> >> >>>> -----BEGIN PGP SIGNATURE----- >>>>>> >> >> >>>> Version: Mailvelope v1.1.0 >>>>>> >> >> >>>> Comment: https://www.mailvelope.com >>>>>> >> >> >>>> >>>>>> >> >> >>>> wsFcBAEBCAAQBQJWEDY1CRDmVDuy+mK58QAARgoP/RcoL1qVmg7qbQrzStar >>>>>> >> >> >>>> NK80bqYGeYHb26xHbt1fZVgnZhXU0nN0Dv4ew0e/cYJLELSO2KCeXNfXN6F1 >>>>>> >> >> >>>> prZuzYagYEyj1Q1TOo+4h/nOQRYsTwQDdFzbHb/OUDN55C0QGZ29DjEvrqP6 >>>>>> >> >> >>>> K5l6sAQzvQDpUEEIiOCkS6pH59ira740nSmnYkEWhr1lxF/hMjb6fFlfCFe2 >>>>>> >> >> >>>> h1djM0GfY7vBHFGgI3jkw0BL5AQnWe+SCcCiKZmxY6xiR70FWl3XqK5M+nxm >>>>>> >> >> >>>> iq74y7Dv6cpenit6boMr6qtOeIt+8ko85hVMh09Hkaqz/m2FzxAKLcahzkGF >>>>>> >> >> >>>> Fh/M6YBzgnX7QBURTC4YQT/FVyDTW3JMuT3RKQdaX6c0iiOsVdkE+iyidWyY >>>>>> >> >> >>>> Hr1KzWU23Ur9yBfZ39Y43jrsSiAEwHnKjSqMowSGljdTysNEAAZQhlqZIoHb >>>>>> >> >> >>>> JlgpB39ugkHI1H5fZ5b2SIDz32/d5ywG4Gay9Rk6hp8VanvIrBbev+JYEoYT >>>>>> >> >> >>>> 8/WX+fhueHt4dqUYWIl3HZ0CEzbXbug0xmFvhrbmL2f3t9XOkDZRbAjlYrGm >>>>>> >> >> >>>> lswiJMDueY8JkxSnPvCQrHXqjbCcy9rMG7nTnLFz98rTcHNCwtpv0qVYhheg >>>>>> >> >> >>>> 4YRNRVMbfNP/6xsJvG1wVOSQPwxZSPqJh42pDqMRePJl3Zn66MTx5wvdNDpk >>>>>> >> >> >>>> l7OF >>>>>> >> >> >>>> =OI++ >>>>>> >> >> >>>> -----END PGP SIGNATURE----- >>>>>> >> >> >>>> ---------------- >>>>>> >> >> >>>> Robert LeBlanc >>>>>> >> >> >>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >> >>>> >>>>>> >> >> >>>> >>>>>> >> >> >>>> On Fri, Sep 25, 2015 at 2:40 PM, Robert LeBlanc wrote: >>>>>> >> >> >>> >> We dropped the replication on our cluster from 4 to 3 and it looks >>>>>> >> >> >>> >> like all the blocked I/O has stopped (no entries in the log for the >>>>>> >> >> >>> >> last 12 hours). This makes me believe that there is some issue with >>>>>> >> >> >>> >> the number of sockets or some other TCP issue. We have not messed with >>>>>> >> >> >>> >> Ephemeral ports and TIME_WAIT at this point. There are 130 OSDs, 8 KVM >>>>>> >> >> >>> >> hosts hosting about 150 VMs. Open files is set at 32K for the OSD >>>>>> >> >> >>> >> processes and 16K system wide. >>>>>> >> >> >>> >> >>>>>> >> >> >>> >> Does this seem like the right spot to be looking? What are some >>>>>> >> >> >>> >> configuration items we should be looking at? >>>>>> >> >> >>> >> >>>>>> >> >> >>> >> Thanks, >>>>>> >> >> >>> >> ---------------- >>>>>> >> >> >>> >> Robert LeBlanc >>>>>> >> >> >>> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >> >>> >> >>>>>> >> >> >>> >> >>>>>> >> >> >>> >> On Wed, Sep 23, 2015 at 1:30 PM, Robert LeBlanc wrote: >>>>>> >> >> >>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >> >>> >>> Hash: SHA256 >>>>>> >> >> >>> >>> >>>>>> >> >> >>> >>> We were able to only get ~17Gb out of the XL710 (heavily tweaked) >>>>>> >> >> >>> >>> until we went to the 4.x kernel where we got ~36Gb (no tweaking). It >>>>>> >> >> >>> >>> seems that there were some major reworks in the network handling in >>>>>> >> >> >>> >>> the kernel to efficiently handle that network rate. If I remember >>>>>> >> >> >>> >>> right we also saw a drop in CPU utilization. I'm starting to think >>>>>> >> >> >>> >>> that we did see packet loss while congesting our ISLs in our initial >>>>>> >> >> >>> >>> testing, but we could not tell where the dropping was happening. We >>>>>> >> >> >>> >>> saw some on the switches, but it didn't seem to be bad if we weren't >>>>>> >> >> >>> >>> trying to congest things. We probably already saw this issue, just >>>>>> >> >> >>> >>> didn't know it. >>>>>> >> >> >>> >>> - ---------------- >>>>>> >> >> >>> >>> Robert LeBlanc >>>>>> >> >> >>> >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >> >>> >>> >>>>>> >> >> >>> >>> >>>>>> >> >> >>> >>> On Wed, Sep 23, 2015 at 1:10 PM, Mark Nelson wrote: >>>>>> >> >> >>> >>>> FWIW, we've got some 40GbE Intel cards in the community performance cluster >>>>>> >> >> >>> >>>> on a Mellanox 40GbE switch that appear (knock on wood) to be running fine >>>>>> >> >> >>> >>>> with 3.10.0-229.7.2.el7.x86_64. We did get feedback from Intel that older >>>>>> >> >> >>> >>>> drivers might cause problems though. >>>>>> >> >> >>> >>>> >>>>>> >> >> >>> >>>> Here's ifconfig from one of the nodes: >>>>>> >> >> >>> >>>> >>>>>> >> >> >>> >>>> ens513f1: flags=4163 mtu 1500 >>>>>> >> >> >>> >>>> inet 10.0.10.101 netmask 255.255.255.0 broadcast 10.0.10.255 >>>>>> >> >> >>> >>>> inet6 fe80::6a05:caff:fe2b:7ea1 prefixlen 64 scopeid 0x20 >>>>>> >> >> >>> >>>> ether 68:05:ca:2b:7e:a1 txqueuelen 1000 (Ethernet) >>>>>> >> >> >>> >>>> RX packets 169232242875 bytes 229346261232279 (208.5 TiB) >>>>>> >> >> >>> >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>>>> >> >> >>> >>>> TX packets 153491686361 bytes 203976410836881 (185.5 TiB) >>>>>> >> >> >>> >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>>>> >> >> >>> >>>> >>>>>> >> >> >>> >>>> Mark >>>>>> >> >> >>> >>>> >>>>>> >> >> >>> >>>> >>>>>> >> >> >>> >>>> On 09/23/2015 01:48 PM, Robert LeBlanc wrote: >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >> >>> >>>>> Hash: SHA256 >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> OK, here is the update on the saga... >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> I traced some more of blocked I/Os and it seems that communication >>>>>> >> >> >>> >>>>> between two hosts seemed worse than others. I did a two way ping flood >>>>>> >> >> >>> >>>>> between the two hosts using max packet sizes (1500). After 1.5M >>>>>> >> >> >>> >>>>> packets, no lost pings. Then then had the ping flood running while I >>>>>> >> >> >>> >>>>> put Ceph load on the cluster and the dropped pings started increasing >>>>>> >> >> >>> >>>>> after stopping the Ceph workload the pings stopped dropping. >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> I then ran iperf between all the nodes with the same results, so that >>>>>> >> >> >>> >>>>> ruled out Ceph to a large degree. I then booted in the the >>>>>> >> >> >>> >>>>> 3.10.0-229.14.1.el7.x86_64 kernel and with an hour test so far there >>>>>> >> >> >>> >>>>> hasn't been any dropped pings or blocked I/O. Our 40 Gb NICs really >>>>>> >> >> >>> >>>>> need the network enhancements in the 4.x series to work well. >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> Does this sound familiar to anyone? I'll probably start bisecting the >>>>>> >> >> >>> >>>>> kernel to see where this issue in introduced. Both of the clusters >>>>>> >> >> >>> >>>>> with this issue are running 4.x, other than that, they are pretty >>>>>> >> >> >>> >>>>> differing hardware and network configs. >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> Thanks, >>>>>> >> >> >>> >>>>> -----BEGIN PGP SIGNATURE----- >>>>>> >> >> >>> >>>>> Version: Mailvelope v1.1.0 >>>>>> >> >> >>> >>>>> Comment: https://www.mailvelope.com >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> wsFcBAEBCAAQBQJWAvOzCRDmVDuy+mK58QAApOMP/1xmCtW++G11qcE8y/sr >>>>>> >> >> >>> >>>>> RkXguqZJLc4czdOwV/tjUvhVsm5qOl4wvQCtABFZpc6t4+m5nzE3LkA1rl2l >>>>>> >> >> >>> >>>>> AnARPOjh61TO6cV0CT8O0DlqtHmSd2y0ElgAUl0594eInEn7eI7crz8R543V >>>>>> >> >> >>> >>>>> 7I68XU5zL/vNJ9IIx38UqdhtSzXQQL664DGq3DLINK0Yb9XRVBlFip+Slt+j >>>>>> >> >> >>> >>>>> cB64TuWjOPLSH09pv7SUyksodqrTq3K7p6sQkq0MOzBkFQM1FHfOipbo/LYv >>>>>> >> >> >>> >>>>> F42iiQbCvFizArMu20WeOSQ4dmrXT/iecgTfEag/Zxvor2gOi/J6d2XS9ckW >>>>>> >> >> >>> >>>>> byEC5/rbm4yDBua2ZugeNxQLWq0Oa7spZnx7usLsu/6YzeDNI6kmtGURajdE >>>>>> >> >> >>> >>>>> /XC8bESWKveBzmGDzjff5oaMs9A1PZURYnlYADEODGAt6byoaoQEGN6dlFGe >>>>>> >> >> >>> >>>>> LwQ5nOdQYuUrWpJzTJBN3aduOxursoFY8S0eR0uXm0l1CHcp22RWBDvRinok >>>>>> >> >> >>> >>>>> UWk5xRBgjDCD2gIwc+wpImZbCtiTdf0vad1uLvdxGL29iFta4THzJgUGrp98 >>>>>> >> >> >>> >>>>> sUqM3RaTRdJYjFcNP293H7/DC0mqpnmo0Clx3jkdHX+x1EXpJUtocSeI44LX >>>>>> >> >> >>> >>>>> KWIMhe9wXtKAoHQFEcJ0o0+wrXWMevvx33HPC4q1ULrFX0ILNx5Mo0Rp944X >>>>>> >> >> >>> >>>>> 4OEo >>>>>> >> >> >>> >>>>> =P33I >>>>>> >> >> >>> >>>>> -----END PGP SIGNATURE----- >>>>>> >> >> >>> >>>>> ---------------- >>>>>> >> >> >>> >>>>> Robert LeBlanc >>>>>> >> >> >>> >>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> On Tue, Sep 22, 2015 at 4:15 PM, Robert LeBlanc >>>>>> >> >> >>> >>>>> wrote: >>>>>> >> >> >>> >>>>>> >>>>>> >> >> >>> >>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >> >>> >>>>>> Hash: SHA256 >>>>>> >> >> >>> >>>>>> >>>>>> >> >> >>> >>>>>> This is IPoIB and we have the MTU set to 64K. There was some issues >>>>>> >> >> >>> >>>>>> pinging hosts with "No buffer space available" (hosts are currently >>>>>> >> >> >>> >>>>>> configured for 4GB to test SSD caching rather than page cache). I >>>>>> >> >> >>> >>>>>> found that MTU under 32K worked reliable for ping, but still had the >>>>>> >> >> >>> >>>>>> blocked I/O. >>>>>> >> >> >>> >>>>>> >>>>>> >> >> >>> >>>>>> I reduced the MTU to 1500 and checked pings (OK), but I'm still seeing >>>>>> >> >> >>> >>>>>> the blocked I/O. >>>>>> >> >> >>> >>>>>> - ---------------- >>>>>> >> >> >>> >>>>>> Robert LeBlanc >>>>>> >> >> >>> >>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >> >>> >>>>>> >>>>>> >> >> >>> >>>>>> >>>>>> >> >> >>> >>>>>> On Tue, Sep 22, 2015 at 3:52 PM, Sage Weil wrote: >>>>>> >> >> >>> >>>>>>> >>>>>> >> >> >>> >>>>>>> On Tue, 22 Sep 2015, Samuel Just wrote: >>>>>> >> >> >>> >>>>>>>> >>>>>> >> >> >>> >>>>>>>> I looked at the logs, it looks like there was a 53 second delay >>>>>> >> >> >>> >>>>>>>> between when osd.17 started sending the osd_repop message and when >>>>>> >> >> >>> >>>>>>>> osd.13 started reading it, which is pretty weird. Sage, didn't we >>>>>> >> >> >>> >>>>>>>> once see a kernel issue which caused some messages to be mysteriously >>>>>> >> >> >>> >>>>>>>> delayed for many 10s of seconds? >>>>>> >> >> >>> >>>>>>> >>>>>> >> >> >>> >>>>>>> >>>>>> >> >> >>> >>>>>>> Every time we have seen this behavior and diagnosed it in the wild it >>>>>> >> >> >>> >>>>>>> has >>>>>> >> >> >>> >>>>>>> been a network misconfiguration. Usually related to jumbo frames. >>>>>> >> >> >>> >>>>>>> >>>>>> >> >> >>> >>>>>>> sage >>>>>> >> >> >>> >>>>>>> >>>>>> >> >> >>> >>>>>>> >>>>>> >> >> >>> >>>>>>>> >>>>>> >> >> >>> >>>>>>>> What kernel are you running? >>>>>> >> >> >>> >>>>>>>> -Sam >>>>>> >> >> >>> >>>>>>>> >>>>>> >> >> >>> >>>>>>>> On Tue, Sep 22, 2015 at 2:22 PM, Robert LeBlanc wrote: >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >> >>> >>>>>>>>> Hash: SHA256 >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> OK, looping in ceph-devel to see if I can get some more eyes. I've >>>>>> >> >> >>> >>>>>>>>> extracted what I think are important entries from the logs for the >>>>>> >> >> >>> >>>>>>>>> first blocked request. NTP is running all the servers so the logs >>>>>> >> >> >>> >>>>>>>>> should be close in terms of time. Logs for 12:50 to 13:00 are >>>>>> >> >> >>> >>>>>>>>> available at http://162.144.87.113/files/ceph_block_io.logs.tar.xz >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:06.500374 - osd.17 gets I/O from client >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:06.557160 - osd.17 submits I/O to osd.13 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:06.557305 - osd.17 submits I/O to osd.16 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:06.573711 - osd.16 gets I/O from osd.17 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:06.595716 - osd.17 gets ondisk result=0 from osd.16 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:06.640631 - osd.16 reports to osd.17 ondisk result=0 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:36.926691 - osd.17 reports slow I/O > 30.439150 sec >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:59.790591 - osd.13 gets I/O from osd.17 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:59.812405 - osd.17 gets ondisk result=0 from osd.13 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:56:02.941602 - osd.13 reports to osd.17 ondisk result=0 >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> In the logs I can see that osd.17 dispatches the I/O to osd.13 and >>>>>> >> >> >>> >>>>>>>>> osd.16 almost silmutaniously. osd.16 seems to get the I/O right away, >>>>>> >> >> >>> >>>>>>>>> but for some reason osd.13 doesn't get the message until 53 seconds >>>>>> >> >> >>> >>>>>>>>> later. osd.17 seems happy to just wait and doesn't resend the data >>>>>> >> >> >>> >>>>>>>>> (well, I'm not 100% sure how to tell which entries are the actual data >>>>>> >> >> >>> >>>>>>>>> transfer). >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> It looks like osd.17 is receiving responses to start the communication >>>>>> >> >> >>> >>>>>>>>> with osd.13, but the op is not acknowledged until almost a minute >>>>>> >> >> >>> >>>>>>>>> later. To me it seems that the message is getting received but not >>>>>> >> >> >>> >>>>>>>>> passed to another thread right away or something. This test was done >>>>>> >> >> >>> >>>>>>>>> with an idle cluster, a single fio client (rbd engine) with a single >>>>>> >> >> >>> >>>>>>>>> thread. >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> The OSD servers are almost 100% idle during these blocked I/O >>>>>> >> >> >>> >>>>>>>>> requests. I think I'm at the end of my troubleshooting, so I can use >>>>>> >> >> >>> >>>>>>>>> some help. >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> Single Test started about >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:52:36 >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:36.926680 osd.17 192.168.55.14:6800/16726 56 : >>>>>> >> >> >>> >>>>>>>>> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > >>>>>> >> >> >>> >>>>>>>>> 30.439150 secs >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:36.926699 osd.17 192.168.55.14:6800/16726 57 : >>>>>> >> >> >>> >>>>>>>>> cluster [WRN] slow request 30.439150 seconds old, received at >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:06.487451: >>>>>> >> >> >>> >>>>>>>>> osd_op(client.250874.0:1388 rbd_data.3380e2ae8944a.0000000000000545 >>>>>> >> >> >>> >>>>>>>>> [set-alloc-hint object_size 4194304 write_size 4194304,write >>>>>> >> >> >>> >>>>>>>>> 0~4194304] 8.bbf3e8ff ack+ondisk+write+known_if_redirected e56785) >>>>>> >> >> >>> >>>>>>>>> currently waiting for subops from 13,16 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:36.697904 osd.16 192.168.55.13:6800/29410 7 : cluster >>>>>> >> >> >>> >>>>>>>>> [WRN] 2 slow requests, 2 included below; oldest blocked for > >>>>>> >> >> >>> >>>>>>>>> 30.379680 secs >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:36.697918 osd.16 192.168.55.13:6800/29410 8 : cluster >>>>>> >> >> >>> >>>>>>>>> [WRN] slow request 30.291520 seconds old, received at 2015-09-22 >>>>>> >> >> >>> >>>>>>>>> 12:55:06.406303: >>>>>> >> >> >>> >>>>>>>>> osd_op(client.250874.0:1384 rbd_data.3380e2ae8944a.0000000000000541 >>>>>> >> >> >>> >>>>>>>>> [set-alloc-hint object_size 4194304 write_size 4194304,write >>>>>> >> >> >>> >>>>>>>>> 0~4194304] 8.5fb2123f ack+ondisk+write+known_if_redirected e56785) >>>>>> >> >> >>> >>>>>>>>> currently waiting for subops from 13,17 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:55:36.697927 osd.16 192.168.55.13:6800/29410 9 : cluster >>>>>> >> >> >>> >>>>>>>>> [WRN] slow request 30.379680 seconds old, received at 2015-09-22 >>>>>> >> >> >>> >>>>>>>>> 12:55:06.318144: >>>>>> >> >> >>> >>>>>>>>> osd_op(client.250874.0:1382 rbd_data.3380e2ae8944a.000000000000053f >>>>>> >> >> >>> >>>>>>>>> [set-alloc-hint object_size 4194304 write_size 4194304,write >>>>>> >> >> >>> >>>>>>>>> 0~4194304] 8.312e69ca ack+ondisk+write+known_if_redirected e56785) >>>>>> >> >> >>> >>>>>>>>> currently waiting for subops from 13,14 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:58:03.998275 osd.13 192.168.55.12:6804/4574 130 : >>>>>> >> >> >>> >>>>>>>>> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > >>>>>> >> >> >>> >>>>>>>>> 30.954212 secs >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:58:03.998286 osd.13 192.168.55.12:6804/4574 131 : >>>>>> >> >> >>> >>>>>>>>> cluster [WRN] slow request 30.954212 seconds old, received at >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:57:33.044003: >>>>>> >> >> >>> >>>>>>>>> osd_op(client.250874.0:1873 rbd_data.3380e2ae8944a.000000000000070d >>>>>> >> >> >>> >>>>>>>>> [set-alloc-hint object_size 4194304 write_size 4194304,write >>>>>> >> >> >>> >>>>>>>>> 0~4194304] 8.e69870d4 ack+ondisk+write+known_if_redirected e56785) >>>>>> >> >> >>> >>>>>>>>> currently waiting for subops from 16,17 >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:58:03.759826 osd.16 192.168.55.13:6800/29410 10 : >>>>>> >> >> >>> >>>>>>>>> cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > >>>>>> >> >> >>> >>>>>>>>> 30.704367 secs >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:58:03.759840 osd.16 192.168.55.13:6800/29410 11 : >>>>>> >> >> >>> >>>>>>>>> cluster [WRN] slow request 30.704367 seconds old, received at >>>>>> >> >> >>> >>>>>>>>> 2015-09-22 12:57:33.055404: >>>>>> >> >> >>> >>>>>>>>> osd_op(client.250874.0:1874 rbd_data.3380e2ae8944a.000000000000070e >>>>>> >> >> >>> >>>>>>>>> [set-alloc-hint object_size 4194304 write_size 4194304,write >>>>>> >> >> >>> >>>>>>>>> 0~4194304] 8.f7635819 ack+ondisk+write+known_if_redirected e56785) >>>>>> >> >> >>> >>>>>>>>> currently waiting for subops from 13,17 >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> Server IP addr OSD >>>>>> >> >> >>> >>>>>>>>> nodev - 192.168.55.11 - 12 >>>>>> >> >> >>> >>>>>>>>> nodew - 192.168.55.12 - 13 >>>>>> >> >> >>> >>>>>>>>> nodex - 192.168.55.13 - 16 >>>>>> >> >> >>> >>>>>>>>> nodey - 192.168.55.14 - 17 >>>>>> >> >> >>> >>>>>>>>> nodez - 192.168.55.15 - 14 >>>>>> >> >> >>> >>>>>>>>> nodezz - 192.168.55.16 - 15 >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> fio job: >>>>>> >> >> >>> >>>>>>>>> [rbd-test] >>>>>> >> >> >>> >>>>>>>>> readwrite=write >>>>>> >> >> >>> >>>>>>>>> blocksize=4M >>>>>> >> >> >>> >>>>>>>> ##runtime=60 >>>>>> >> >> >>> >>>>>>>>> name=rbd-test >>>>>> >> >> >>> >>>>>>>> ##readwrite=randwrite >>>>>> >> >> >>> >>>>>>>> ##bssplit=4k/85:32k/11:512/3:1m/1,4k/89:32k/10:512k/1 >>>>>> >> >> >>> >>>>>>>> ##rwmixread=72 >>>>>> >> >> >>> >>>>>>>> ##norandommap >>>>>> >> >> >>> >>>>>>>> ##size=1T >>>>>> >> >> >>> >>>>>>>> ##blocksize=4k >>>>>> >> >> >>> >>>>>>>>> ioengine=rbd >>>>>> >> >> >>> >>>>>>>>> rbdname=test2 >>>>>> >> >> >>> >>>>>>>>> pool=rbd >>>>>> >> >> >>> >>>>>>>>> clientname=admin >>>>>> >> >> >>> >>>>>>>>> iodepth=8 >>>>>> >> >> >>> >>>>>>>> ##numjobs=4 >>>>>> >> >> >>> >>>>>>>> ##thread >>>>>> >> >> >>> >>>>>>>> ##group_reporting >>>>>> >> >> >>> >>>>>>>> ##time_based >>>>>> >> >> >>> >>>>>>>> ##direct=1 >>>>>> >> >> >>> >>>>>>>> ##ramp_time=60 >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> Thanks, >>>>>> >> >> >>> >>>>>>>>> -----BEGIN PGP SIGNATURE----- >>>>>> >> >> >>> >>>>>>>>> Version: Mailvelope v1.1.0 >>>>>> >> >> >>> >>>>>>>>> Comment: https://www.mailvelope.com >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> wsFcBAEBCAAQBQJWAcaKCRDmVDuy+mK58QAAPMsQAKBnS94fwuw0OqpPU3/z >>>>>> >> >> >>> >>>>>>>>> tL8Z6TVRxrNigf721+2ClIu4LIH71bupDc3DgrrysQmmqGuvEMn68spmasWu >>>>>> >> >> >>> >>>>>>>>> h9I/CqqgRpHqe4lUVoUEjyWA9/6Dbb6NiHSdpJ6p5jpGc8kZCvNS+ocDgFOl >>>>>> >> >> >>> >>>>>>>>> 903i0M0E9eEMeci5O/hrMrx1FG8SN2LS8nI261aNHMOwQK0bw8wWiCJEvqVB >>>>>> >> >> >>> >>>>>>>>> sz1/+jK1BJoeIYfaT9HfUXBAvfo/W3tY/vj9KbJuZJ5AMpeYPvEHu/LAr1N7 >>>>>> >> >> >>> >>>>>>>>> FzzUc7a6EMlaxmSd0ML49JbV0cY9BMDjfrkKEQNKlzszlEHm3iif98QtsxbF >>>>>> >> >> >>> >>>>>>>>> pPJ0hZ0G53BY3k976OWVMFm3WFRWUVOb/oiLF8H6PCm59b4LBNAg6iPNH1AI >>>>>> >> >> >>> >>>>>>>>> 5XhEcPpg06M03vqUaIiY9P1kQlvnn0yCXf82IUEgmg///vhxDsHWmcwClLEn >>>>>> >> >> >>> >>>>>>>>> B0VszouStTzlMYnc/2vlUiI4gFVeilWLMW00VGTWV+7V1oIzIYvWHyl2QpBq >>>>>> >> >> >>> >>>>>>>>> 4/ZwVjQ43qLfuDTS4o+IJ4ztOMd26vIv6Mn6WVwKCjoCXJc8ajywR9Dy+6lL >>>>>> >> >> >>> >>>>>>>>> o8oJ+tn7hMc9Qy1iBhu3/QIP4WCsUf9RVeu60oahNEpde89qW32S9CZlrJDO >>>>>> >> >> >>> >>>>>>>>> gf4iTryRjkAhdmZIj9JiaE8jQ6dvN817D9cqs/CXKV9vhzYoM7p5YWHghBKB >>>>>> >> >> >>> >>>>>>>>> J3hS >>>>>> >> >> >>> >>>>>>>>> =0J7F >>>>>> >> >> >>> >>>>>>>>> -----END PGP SIGNATURE----- >>>>>> >> >> >>> >>>>>>>>> ---------------- >>>>>> >> >> >>> >>>>>>>>> Robert LeBlanc >>>>>> >> >> >>> >>>>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> On Tue, Sep 22, 2015 at 8:31 AM, Gregory Farnum wrote: >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> On Tue, Sep 22, 2015 at 7:24 AM, Robert LeBlanc wrote: >>>>>> >> >> >>> >>>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> >> >> >>> >>>>>>>>>>> Hash: SHA256 >>>>>> >> >> >>> >>>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>>> Is there some way to tell in the logs that this is happening? >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> You can search for the (mangled) name _split_collection >>>>>> >> >> >>> >>>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>>> I'm not >>>>>> >> >> >>> >>>>>>>>>>> seeing much I/O, CPU usage during these times. Is there some way to >>>>>> >> >> >>> >>>>>>>>>>> prevent the splitting? Is there a negative side effect to doing so? >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> Bump up the split and merge thresholds. You can search the list for >>>>>> >> >> >>> >>>>>>>>>> this, it was discussed not too long ago. >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>>> We've had I/O block for over 900 seconds and as soon as the sessions >>>>>> >> >> >>> >>>>>>>>>>> are aborted, they are reestablished and complete immediately. >>>>>> >> >> >>> >>>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>>> The fio test is just a seq write, starting it over (rewriting from >>>>>> >> >> >>> >>>>>>>>>>> the >>>>>> >> >> >>> >>>>>>>>>>> beginning) is still causing the issue. I was suspect that it is not >>>>>> >> >> >>> >>>>>>>>>>> having to create new file and therefore split collections. This is >>>>>> >> >> >>> >>>>>>>>>>> on >>>>>> >> >> >>> >>>>>>>>>>> my test cluster with no other load. >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> Hmm, that does make it seem less likely if you're really not creating >>>>>> >> >> >>> >>>>>>>>>> new objects, if you're actually running fio in such a way that it's >>>>>> >> >> >>> >>>>>>>>>> not allocating new FS blocks (this is probably hard to set up?). >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>>> I'll be doing a lot of testing today. Which log options and depths >>>>>> >> >> >>> >>>>>>>>>>> would be the most helpful for tracking this issue down? >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> >>>>>> >> >> >>> >>>>>>>>>> If you want to go log diving "debug osd = 20", "debug filestore = >>>>>> >> >> >>> >>>>>>>>>> 20", >>>>>> >> >> >>> >>>>>>>>>> "debug ms = 1" are what the OSD guys like to see. That should spit >>>>>> >> >> >>> >>>>>>>>>> out >>>>>> >> >> >>> >>>>>>>>>> everything you need to track exactly what each Op is doing. >>>>>> >> >> >>> >>>>>>>>>> -Greg >>>>>> >> >> >>> >>>>>>>>> >>>>>> >> >> >>> >>>>>>>>> -- >>>>>> >> >> >>> >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>> >> >> >>> >>>>>>>>> in >>>>>> >> >> >>> >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> >> >> >>> >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >> >> >>> >>>>>>>> >>>>>> >> >> >>> >>>>>>>> >>>>>> >> >> >>> >>>>>>>> >>>>>> >> >> >>> >>>>>> >>>>>> >> >> >>> >>>>>> -----BEGIN PGP SIGNATURE----- >>>>>> >> >> >>> >>>>>> Version: Mailvelope v1.1.0 >>>>>> >> >> >>> >>>>>> Comment: https://www.mailvelope.com >>>>>> >> >> >>> >>>>>> >>>>>> >> >> >>> >>>>>> wsFcBAEBCAAQBQJWAdMSCRDmVDuy+mK58QAAoEgP/AqpH7i1BLpoz6fTlfWG >>>>>> >> >> >>> >>>>>> a6swvF8xvsyR15PDiPINYT0N7MgoikikGrMmhWpJ6utEr1XPW0MPFgzvNIsf >>>>>> >> >> >>> >>>>>> a1eMtNzyww4rAo6JCq6BtjmUsSKmOrBNhRNr6It9v4Nv+biqZHkiY8x/rRtV >>>>>> >> >> >>> >>>>>> s9z0cv3Q9Wqa6y/zKZg3H1XtbtUAx0r/DUwzSsP3omupZgNyaKkCgdkil9Vc >>>>>> >> >> >>> >>>>>> iyzBxFZU4+qXNT2FBG4dYDjxSHQv4psjvKR3AWXSN4yEn286KyMDjFrsDY5B >>>>>> >> >> >>> >>>>>> izS3h603QPoErqsUQngDE8COcaTAHHrV7gNJTikmGoNW6oQBjFq/z/zindTz >>>>>> >> >> >>> >>>>>> caXshVQQ+OTLo/qzJM8QPswh0TGU74SVbDkTq+eTOb5pBhQbp+42Pkkqh7jj >>>>>> >> >> >>> >>>>>> efyyYgDzpB1WrWRbUlWMNqmnjq7DT3lnAtuHyKbkwVs8x3JMPEiCl6PBvJbx >>>>>> >> >> >>> >>>>>> GnNSCqgDJrpb4fHQ2iqfQeh8Ai6AL1C1Ai19RZPrAUhpDW0/DbUvuoKSR8m7 >>>>>> >> >> >>> >>>>>> glYYuH3hpy+oPYRhFcHm2fpNJ3u9npyk2Dai9RpzQ+mWmp3xi7becYmL482H >>>>>> >> >> >>> >>>>>> +WyvLeY+8AiJQDpA0CdD8KeSlOC9bw5TPmihAIn9dVTJ1O2RlapCLqL3YAJg >>>>>> >> >> >>> >>>>>> pGyDs8ercTEJLmvEyElj5XWh5DarsGscd2LELNS/UpyuYurbPcyPKUQ0uPjp >>>>>> >> >> >>> >>>>>> gcZm >>>>>> >> >> >>> >>>>>> =CjwB >>>>>> >> >> >>> >>>>>> -----END PGP SIGNATURE----- >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>>> -- >>>>>> >> >> >>> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> >> >> >>> >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> >> >> >>> >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >> >> >>> >>>>> >>>>>> >> >> >>> >>>> >>>>>> >> >> >>> >>> >>>>>> >> >> >>> >>> -----BEGIN PGP SIGNATURE----- >>>>>> >> >> >>> >>> Version: Mailvelope v1.1.0 >>>>>> >> >> >>> >>> Comment: https://www.mailvelope.com >>>>>> >> >> >>> >>> >>>>>> >> >> >>> >>> wsFcBAEBCAAQBQJWAv3QCRDmVDuy+mK58QAABr4QAJcQj8zjl606aMdkmQG7 >>>>>> >> >> >>> >>> S46iMXVav/Tv2os9GCUsQmMPx2u1w3/WmPfjByd6Divczfo0JLDDqrbsqre2 >>>>>> >> >> >>> >>> lq0GNK6e8fq6FXHhPpnL+t4uFV4UZ289cma3yklRqEBDXWHlP59Hu7VpxC5l >>>>>> >> >> >>> >>> 0MIcCg4wM5VM/LkrfcMven5em5CnjyFJYbActGzw9043rZoyUwCM+eL7sotl >>>>>> >> >> >>> >>> JYHMcNWnqwdt8TLFDhUfVGiAQyV8/6E33CuCNUEuFGdtiBKzs9IZadOI8Ce0 >>>>>> >> >> >>> >>> dod2DQNyFSvomqNq6t0DuTCSA+pT8uuks2O0NcrHjoqwIWVkxQGPYlpbpckf >>>>>> >> >> >>> >>> nxQdVM7vkqapVeQ0qUZx43Db9A5wDTC3PaEfVJZPZzWsSDjh9z7o6qHs3Kvp >>>>>> >> >> >>> >>> krfyS+dJaZ3tOYAP1VFDfasj06sOTFu3mfGYToKA75zz5HN7QZ13Zau/qhDu >>>>>> >> >> >>> >>> FHxsgk4oIXJsjj22LiSpoiigH5Ls+aVqtIbg8/vWp+EO6pK1fovEtJVeGAfE >>>>>> >> >> >>> >>> tLOdxfJJLVjMCAScFG9BRl1ePPLeptivKV0v9ruWsTpn+Q96VtqAR5GQCkYE >>>>>> >> >> >>> >>> hFrlxM+oIzHeArhhiIxSPCYLlnzxoD5IYXmTrWUYBCGvlY1mrI3j80mZ4VTj >>>>>> >> >> >>> >>> BErsSlqnjUyFKmaI7YNKyARCloMroz3wqdy/wpg/63Io62nmh5IyY+WO8hPo >>>>>> >> >> >>> >>> ae22 >>>>>> >> >> >>> >>> =AX+L >>>>>> >> >> >>> >>> -----END PGP SIGNATURE----- >>>>>> >> >> >>>> _______________________________________________ >>>>>> >> >> >>>> ceph-users mailing list >>>>>> >> >> >>>> ceph-users@xxxxxxxxxxxxxx >>>>>> >> >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >> >> >>>> >>>>>> >> >> >>>> >>>>>> >> >>> _______________________________________________ >>>>>> >> >>> ceph-users mailing list >>>>>> >> >>> ceph-users@xxxxxxxxxxxxxx >>>>>> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >> >>> >>>>>> >> >>> >>>>>> >>> -- >>>>>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>> >>>>>> >>> >>>>>>> >>>>>>> -----BEGIN PGP SIGNATURE----- >>>>>>> Version: Mailvelope v1.2.0 >>>>>>> Comment: https://www.mailvelope.com >>>>>>> >>>>>>> wsFcBAEBCAAQBQJWFBoOCRDmVDuy+mK58QAA7oYP/1yVPx66DovoUJiSDunA >>>>>>> NjIXWnKzx77aQMDwueZ0woC8PvgsX4JpLVH90Gh1MOJWyt2L4Qp+n60loSiI >>>>>>> Q5xU1NMYiup8YPlHqyslBxtqCPhcN1R8XhxN212R4uyVBIgjulkkEFiiQf8R >>>>>>> 5Uq5rDy+Vqmbla3enekV9vpAJQhVdfxvhdnN9/tSC3I5JZm+6VW9PGmwvTL4 >>>>>>> HK5UIz8luvtBWCWXYm2m7ZCUKYq0oWfdVDGEpEV473yyYwoVyvTBFuNNNbpu >>>>>>> kdxZ422Ztv2yj5phIQgU88Q/W5NY0awW25+16AMZNb6zCbF06hvQ9SjpydGu >>>>>>> 6vokj3uCOImMZpdJlyMuj6IjIkB27bnJer7zVLM3tDzftPzwT8ia8M3LvMWE >>>>>>> sD9Dl2jx5EdFZYPMxoHF4WnD4SQtUxr+cpcI/Ij96RfXz1cMbMbVdZbWXkfz >>>>>>> gEY46SXuM8yMi7wzJHwd4kI9q8A+ZZDpsDuTyavMr1rqZX61H+Gzc3rNI7lc >>>>>>> lkJ63hfYMPCdYggnUT8mAF+cwXxq66SclwbmBYM8lbrEPuuTZzZp7veLJr5g >>>>>>> /PO1abPcJVYq5ZP7i1iELEac6WvDWcJgImvkF+JZAN57URNpdJA03KsVkIt7 >>>>>>> H5n1Y8zUv7QcVMwHo/Os30vfiPmUHxg9DFbtUU8otpcf3g+udDggWHeuiZiG >>>>>>> 6Kfk >>>>>>> =/gR6 >>>>>>> -----END PGP SIGNATURE----- >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>> >>>>> -----BEGIN PGP SIGNATURE----- >>>>> Version: Mailvelope v1.2.0 >>>>> Comment: https://www.mailvelope.com >>>>> >>>>> wsFcBAEBCAAQBQJWFChuCRDmVDuy+mK58QAAfNsQAMGNu925hGNsCTuY4X7V >>>>> x71rdicFIn41I12KYtmhWl0U/V9GpUwLkOAKzeAcQiK2FgBBYRle0pANqE2K >>>>> Thf4YBJ5oEXZ72WOB14jaggiQkZwiTZLo6c69JLZADaM5NEXD/2mM77HyVLN >>>>> SP5v7FSqtnlzA53aZ7hUZn5r20VfOl/peOJGJz7C393hy3gBjr+P4LKsLE2L >>>>> QO0lNj4mJZVnVXbxqJp9Q8xn86vmfXK2sofqbAv2wjkT2C8gM9DkgLF+UJjc >>>>> mCSL9EUDFHD82BGsWzvYYFci686bIUC9IxJXKLORYKjzH3ueGHhiK3/apIi4 >>>>> 7DA0159nObAVNNz8AvvJnnjK94KrfcqpD3inFT7++WiNWTWbYljC7eukEM8L >>>>> QyrcMnbuomjT87I9wB9zNwa/Pt+AepdwSf7qAv1VVYrop3nJxp8bPVCzvkrr >>>>> MV/gxv3esOF68nOoQ9yt8DyHFihpg0nqSPjY3xDS7qZ05u3jnWN4rgkNxmyR >>>>> rOpwjVLUINAkVjfAM2FL2sW6wX1tKPd947CgMrAgcX0ChwZ1xYzt6xdS0p+R >>>>> gciSgw7nfCvwFmpou0DnqUdTN3K0zvM9zDhQ/b9u7JW3CEZLJXMoi99C4n3g >>>>> RfilE0rvScnx7uTI7mo94Pwy0MYFdGw04sNtFjwjIhRFPSsMUu+NSHDJe26U >>>>> JFPi >>>>> =ofgq >>>>> -----END PGP SIGNATURE----- >>>> >>>> -----BEGIN PGP SIGNATURE----- >>>> Version: Mailvelope v1.2.0 >>>> Comment: https://www.mailvelope.com >>>> >>>> wsFcBAEBCAAQBQJWFDDOCRDmVDuy+mK58QAA0kUP/1rfRQa5Us9b/VCvKrhk >>>> BYrde1/FBybKBVXsuXVU8Dq124A1e4L682AhmQPUeVP8PQLoqS/VFSl0h7i6 >>>> 28AzydDaBTTjnrp6ZzVbtmKtm8WhmtSTFvWTlu/yJmRXAht9YozmFCByBfIY >>>> GYvOhZzjvbxBKfwnwq97QkS7xfY2tss/BmaOvSVTX7naYaOF+HRwZMSt+BF4 >>>> 9vg9BLSL3Aic0BnvdM64TWkDaHp/3gwGSmyMn8Q2Sa9CqUTddKQx2HXN6doo >>>> gIyxCj+dIw2Pt73u2NoiYv8ZhTuS3QYM4n0rRBxj8Wr/EeNwGAOwdDSgbOxf >>>> OvDyozzmCpQyW3h/nkdQJW5mWsJmyDIiGxHDdUn7Vgemg+Bbod0ACdoJiwct >>>> /BIRVQe2Ee1nZQFoKBOhvaWO6+ePJR7CVfLjMkZBTzKZBjt2tfkq17G5KTdS >>>> EsehvG/+vfFJkANL5Xh6eo9ptlHbFW8I/44pvUtGi2JwsN487l56XR9DqEKM >>>> 7Cmj9Ox205YxjqcBjhWIJQTok99lvrhDX9d7HHxIeTcmouvqPz4LTcCySRtC >>>> xE/GcEGAAYWGPTwf9u8ULm9Rh2Z90OnKpqtCtuuWiwRRL9VU/tLlvqmHvEZM >>>> 73qhiLQZka5I72B2SAEtJnDt2sX3NJ4unvH4zWKLRFTTm4M0qk6xUL1JfqNz >>>> JYNo >>>> =msX2 >>>> -----END PGP SIGNATURE----- >> >> -----BEGIN PGP SIGNATURE----- >> Version: Mailvelope v1.2.0 >> Comment: https://www.mailvelope.com >> >> wsFcBAEBCAAQBQJWFXGPCRDmVDuy+mK58QAAx38P/1sn6TA8hH+F2kd1A2Pq >> IU2cg1pFcH+kw21G8VO+BavfBaBoSETHEEuMXg5SszTIcL/HyziBLJos0C0j >> Vu9I0/YtblQ15enzFqKFPosdc7qij9DPJxXRkx41sJZsxvSVky+URcPpcKk6 >> w8Lwuq9IupesQ19ZeJkCEWFVhKz/i2E9/VXfylBgFVlkICD+5pfx6/Aq7nCP >> 4gboyha07zpPlDqoA7xgT+6v2zlYC80saGcA1m2XaAUdPF/17l6Mq9+Glv7E >> 3KeUf7jmMTJQRGBZSInFgUpPwUQKvF5OSGb3YQlzofUy5Es+wH3ccqZ+mlIY >> szuBLAtN6zhFFPCs6016hiragiUhLk97PItXaKdDJKecuyRdShlJrXJmtX+j >> NdM14TkBPTiLtAd/IZEEhIIpdvQH8YSl3LnEZ5gywggaY4Pk3JLFIJPgLpEb >> T8hJnuiaQaYxERQ0nRoBL4LAXARseSrOuVt2EAD50Yb/5JEwB9FQlN758rb1 >> AE/xhpK6d53+RlkPODKxXx816hXvDP6NADaC78XGmx+A4FfepdxBijGBsmOQ >> 7SxAZe469K0E6EAfClc664VzwuvBEZjwTg1eK5Z6VS/FDTH/RxTKeFhlbUIT >> XpezlP7XZ1/YRrJ/Eg7nb1Dv0MYQdu18tQ6QBv+C1ZsmxYLlHlcf6BZ3gNar >> rZW5 >> =dKn9 >> -----END PGP SIGNATURE----- > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- WBR, Max A. Krasilnikov ColoCall Data Center _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com