On Sat, Oct 14, 2017 at 9:33 AM, David Turner <drakonstein@xxxxxxxxx> wrote: > First, there is no need to deep scrub your PGs every 2 days. They aren’t being deep scrubbed every two days, nor is there any attempt (or desire) to do so. That would be require 8+ scrubs running at once. Currently, it takes between 2 and 3 *weeks* to deep scrub every PG one at a time with no breaks. Perhaps you misread “48 days” as “48 hours?” As long as having one deep scrub running renders the cluster unusable, the frequency of deep scrubs doesn’t really matter; “ever” is too often. If that issue can be resolved, the cron script we wrote will scrub all the PG’s over a period of 28 days. > I'm thinking your 1GB is either a typo for a 1TB disk or that your DB > partitions are 1GB each. That is a typo, yes. The SSDs are 100GB (really about 132GB, with overprovisioning), and each one has three 30GB partitions, one for each OSD on that host. These SSDs perform excellently in testing and in other applications. They are being utilized <1% of their I/O capacity (by both IOPS and throughput) by this ceph cluster. So far there hasn’t been any thing we’ve seen suggesting there’s a problem with these drives. > Third, when talking of a distributed storage system you can never assume it > isn’t the network. No assumption is necessary; the network has been exhaustively tested, both with and without ceph running, both with and without LACP. The network topology is dirt simple. There’s a dedicated 10Gbps switch with 6 two-port LACPs connected to five ceph nodes, one client, and nothing else. There are no interface errors, overruns, link failures or LACP errors on any of the cluster nodes or on the switch. Like the SSDs (and the CPUs, and the RAM), the network passes all tests thrown at it and is being utilized by ceph to a very small fraction of its demonstrated capacity. But, it’s not a sticking point. The LAN has now been reconfigured to remove LACP and use each of the ceph nodes’ 10Gbps interfaces individually, one as public network, one as cluster network, with separate VLANs on the switch. That’s all confirmed to have taken effect after a full shutdown and restart of all five nodes and the client. That change had no effect on this issue. With that change made, the network was re-tested by setting up 20 simultaneous iperf sessions, 10 clients and 10 servers, with each machine participating in 4 10-minute tests at once: inbound public network, outbound public network, inbound cluster network, outbound cluster network. With all 20 tests running simultaneously, the average throughput per test was 7.5Gbps. (With 10 unidirectional tests, the average throughput is over 9Gbps.) The client (participating only on the public network) was separately tested. With five sequential runs, each run testing inbound and outbound simultaneously between the client and one of the five ceph nodes, in each case, the results were over 7Gbps in each direction. No loss, errors or drops were observed on any interface, nor on the switch, during either test. So it does not appear that there are any network problems contributing to the issue. Thanks! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com