-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Is there some way to tell in the logs that this is happening? I'm not seeing much I/O, CPU usage during these times. Is there some way to prevent the splitting? Is there a negative side effect to doing so? We've had I/O block for over 900 seconds and as soon as the sessions are aborted, they are reestablished and complete immediately. The fio test is just a seq write, starting it over (rewriting from the beginning) is still causing the issue. I was suspect that it is not having to create new file and therefore split collections. This is on my test cluster with no other load. I'll be doing a lot of testing today. Which log options and depths would be the most helpful for tracking this issue down? Thanks, -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.1.0 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJWAWShCRDmVDuy+mK58QAAuUAP/3XuYrcOsneXKvWhHSRV 4oi6MZ4mEuVvxGsf+2Nhx70CUJGNOH37cpNL3xTt5R9V7Kpj0KoxoyVv81bN ud1YfH5jZn1sGizHBEIR94mqNkqsQmYyqLAvez2xhShAbKYdsMjvyxovUGBE skLY6oXNZ8UVAuBRoq8KMNWCCf5mLlp/XITYd9B+SMOwTEcU9D/tdkOMf8fn wIv3FHIMOLgVmvzCgfXPjuPCvl2eo3oO9bSGmWU0FZUUTGzc+PranuQngULz JOPaA2Qvte+jn0lU99tZhPaZ+62E9L8sZtQ2eorJoF1SBJtpzF+TW0Ev+7co DNBdqp+JHTQIEyuPluhWi89E+MZlhQcsEBpb82Y5FIcZAjI00AJP+IHmFFPZ ThP1UVpyymY3qn5995V0eUnbt6vpRUGDDdxPTMmW8dCRVZz9F1n2eoM1tdUS t/tChgLHRq1RL0N2gD2w1E8r+t5Cu5zYK/+ZWs6HhRc1LuxwtOy/3QbXO+Bu SfgFHh+tMFDinVSQCAbx6a759ySZ2FoMBhxONljluaemrdDCntcgq+52h3dK Q4lkf1y3a4sdqHQwJ+Ew3rONilixC0abHw+GF29GjCXbYDBUeLxXoqIJXQbM TGsOz4v0AnDLzgFQIaSHyweuptyh8MKT3XJbrOOAcmZo3YmGtYYfjSF6+qXF 6PLJ =HIRW -----END PGP SIGNATURE----- ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Sep 22, 2015 at 8:09 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Mon, Sep 21, 2015 at 11:43 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> I'm starting to wonder if this has to do with some OSDs getting full >> or the 0.94.3 code. Earlier this afternoon, I cleared out my test >> cluster so there was no pools. I created anew rbd pool and started >> filling it with 6 - 1TB fio jobs replication 3 with 6 spindles over >> six servers. It was running 0.94.2 at the time. After several hours of >> writes, we had the new patched 0.93.3 binaries ready for testing so I >> rolled the update on the test cluster while the fio jobs were running. >> There were a few blocked I/O as the services were restarted (nothing >> I'm concerned about). Now that the OSDs are about 60% full, the >> blocked I/O is becoming very frequent even with the backports. The >> write bandwidth was consistently at 200 MB/s until this point, now it >> is fluctuating between 200 MB/s and 75 MB/s mostly around about >> 100MB/s. Our production cluster is XFS on the OSDs, this test cluster >> is EXT4. >> >> I'll see if I can go back to 0.94.2 and fill the cluster up again.... >> Going back to 0.94.2 and 0.94.0 still has the issue (although I didn't >> refill the cluster, I didn't delete what was already there). I'm >> building the latest of hammer-backports now and see if it resolves the >> issue. > > You're probably running into the FileStore collection splitting and > that's what is slowing things down in that testing. > -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com