Re: osd max scrubs not honored?

J David <j.david.lists@xxxxxxxxx> · Sat, 14 Oct 2017 18:51:06 -0400

On Sat, Oct 14, 2017 at 9:33 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
> First, there is no need to deep scrub your PGs every 2 days.

They aren’t being deep scrubbed every two days, nor is there any
attempt (or desire) to do so.  That would be require 8+ scrubs running
at once.  Currently, it takes between 2 and 3 *weeks* to deep scrub
every PG one at a time with no breaks.  Perhaps you misread “48 days”
as “48 hours?”

As long as having one deep scrub running renders the cluster unusable,
the frequency of deep scrubs doesn’t really matter; “ever” is too
often.  If that issue can be resolved, the cron script we wrote will
scrub all the PG’s over a period of 28 days.

> I'm thinking your 1GB is either a typo for a 1TB disk or that your DB
> partitions are 1GB each.

That is a typo, yes.  The SSDs are 100GB (really about 132GB, with
overprovisioning), and each one has three 30GB partitions, one for
each OSD on that host.  These SSDs perform excellently in testing and
in other applications.  They are being utilized <1% of their I/O
capacity (by both IOPS and throughput) by this ceph cluster.  So far
there hasn’t been any thing we’ve seen suggesting there’s a problem
with these drives.

> Third, when talking of a distributed storage system you can never assume it
> isn’t the network.

No assumption is necessary; the network has been exhaustively tested,
both with and without ceph running, both with and without LACP.

The network topology is dirt simple.  There’s a dedicated 10Gbps
switch with 6 two-port LACPs connected to five ceph nodes, one client,
and nothing else.  There are no interface errors, overruns, link
failures or LACP errors on any of the cluster nodes or on the switch.
Like the SSDs (and the CPUs, and the RAM), the network passes all
tests thrown at it and is being utilized by ceph to a very small
fraction of its demonstrated capacity.

But, it’s not a sticking point.  The LAN has now been reconfigured to
remove LACP and use each of the ceph nodes’ 10Gbps interfaces
individually, one as public network, one as cluster network, with
separate VLANs on the switch.  That’s all confirmed to have taken
effect after a full shutdown and restart of all five nodes and the
client.

That change had no effect on this issue.

With that change made, the network was re-tested by setting up 20
simultaneous iperf sessions, 10 clients and 10 servers, with each
machine participating in 4 10-minute tests at once: inbound public
network, outbound public network, inbound cluster network, outbound
cluster network.  With all 20 tests running simultaneously, the
average throughput per test was 7.5Gbps. (With 10 unidirectional
tests, the average throughput is over 9Gbps.)

The client (participating only on the public network) was separately
tested.  With five sequential runs, each run testing inbound and
outbound simultaneously between the client and one of the five ceph
nodes, in each case, the results were over 7Gbps in each direction.

No loss, errors or drops were observed on any interface, nor on the
switch, during either test.

So it does not appear that there are any network problems contributing
to the issue.

Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com