purging strays faster

jspray@xxxxxxxxxx (John Spray) · Mon, 6 Mar 2017 15:05:04 +0000

On Mon, Mar 6, 2017 at 3:03 PM, Daniel Davidson
<danield at igb.illinois.edu> wrote:
> Thanks for the suggestion, however I think my more immediate problem is the
> ms_handle_reset messages. I do not think the mds are getting the updates
> when I send them.

I wouldn't assume that.  You can check the current config state to see
that your values got through by using "ceph daemon mds.<id> config
show".

John

>
> Dan
>
>
> On 03/04/2017 09:08 AM, John Spray wrote:
>>
>> On Fri, Mar 3, 2017 at 9:48 PM, Daniel Davidson
>> <danield at igb.illinois.edu> wrote:
>>>
>>> ceph daemonperf mds.ceph-0
>>> -----mds------ --mds_server-- ---objecter--- -----mds_cache-----
>>> ---mds_log----
>>> rlat inos caps|hsr  hcs  hcr |writ read actv|recd recy stry purg|segs
>>> evts
>>> subm|
>>>    0  336k  97k|  0    0    0 |  0    0   20 |  0    0 246k   0 | 31
>>> 27k
>>> 0
>>>    0  336k  97k|  0    0    0 |112    0   20 |  0    0 246k  55 | 31
>>> 26k
>>> 55
>>>    0  336k  97k|  0    1    0 | 90    0   20 |  0    0 246k  45 | 31
>>> 26k
>>> 45
>>>    0  336k  97k|  0    0    0 |  2    0   20 |  0    0 246k   1 | 31
>>> 26k
>>> 1
>>>    0  336k  97k|  0    0    0 |166    0   21 |  0    0 246k  83 | 31
>>> 26k
>>> 83
>>>
>>> I have too many strays that seem to be causing disk full errors when
>>> deleting man files (hundreds of thousands)  the number here is down from
>>> over 400k.  I have been trying to up the number of processes to do this,
>>> but
>>> it is not happening:
>>>
>>> ceph tell mds.ceph-0 injectargs --mds-max-purge-ops-per-pg 2
>>> 2017-03-03 15:44:00.606548 7fd96400a700  0 client.225772 ms_handle_reset
>>> on
>>> 172.16.31.1:6800/55710
>>> 2017-03-03 15:44:00.618556 7fd96400a700  0 client.225776 ms_handle_reset
>>> on
>>> 172.16.31.1:6800/55710
>>> mds_max_purge_ops_per_pg = '2'
>>>
>>> ceph tell mds.ceph-0 injectargs --mds-max-purge-ops 16384
>>> 2017-03-03 15:45:27.256132 7ff6d900c700  0 client.225808 ms_handle_reset
>>> on
>>> 172.16.31.1:6800/55710
>>> 2017-03-03 15:45:27.268302 7ff6d900c700  0 client.225812 ms_handle_reset
>>> on
>>> 172.16.31.1:6800/55710
>>> mds_max_purge_ops = '16384'
>>>
>>> I do have a backfill running as I also have a new node that is almost
>>> done.
>>> Any ideas as to what is going on here?
>>
>> Try also increasing mds_max_purge_files.    If your files are small
>> then that is likely to be the bottleneck.
>>
>> John
>>
>>> Dan
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>