Re: CEPH zero iops after upgrade to Reef and manual read balancer

Laura Flores <lflores@xxxxxxxxxx> · Tue, 26 Sep 2023 10:01:47 -0500

Hi Mosharaf,

Thanks for the update. If you can reproduce the issue, it would be most
helpful to us if you provided the output of `ceph -s`, along with a copy of
your osdmap file.

If you have this information, you can update the tracker here:
https://tracker.ceph.com/issues/62836

Thanks,
Laura

On Mon, Sep 25, 2023 at 6:02 AM Mosharaf Hossain <
mosharaf.hossain@xxxxxxxxxxxxxx> wrote:

> Greetings Josh,
>
> I executed the command today, and it effectively resolved the issue.
> Within moments, my pools became active, and read/write IOPS started to
> rise.
> Furthermore, the Hypervisor and VMs can now communicate seamlessly with
> the CEPH Cluster.
>
> *Command run:*
> ceph osd  rm-pg-upmap-primary 21.a0
>
>
> *To summarize our findings:*
>
>    - Enabling the Ceph read balancer resulted in libvirtd from the
>    hypervisor being unable to communicate with the CEPH cluster.
>    - During the case, using RBD command images on the pool was readable
>
>
> I'd like to express my gratitude to everyone involved, especially the
> forum contributors.
>
>
>
> Regards
> Mosharaf Hossain
> Manager, Product Development
> IT Division
> BEXIMCO IT
>
>
>
> On Thu, Sep 14, 2023 at 7:58 PM Josh Salomon <jsalomon@xxxxxxxxxx> wrote:
>
>> Hi Mosharaf - I will check it but I can assure that this error is a CLI
>> error and the command has not impacted the system or the data. I have no
>> clue what happened - I am sure I tested this scenario.
>> The command syntax is
>> ceph osd  rm-pg-upmap-primary <PGID>
>> the error you get is because you did not specify the pg id. Run this
>> command in a loop for all pgs in a pool to return the pool to its original
>> state,
>> Regards,
>>
>> Josh
>>
>>
>> On Thu, Sep 14, 2023 at 4:24 PM Mosharaf Hossain <
>> mosharaf.hossain@xxxxxxxxxxxxxx> wrote:
>>
>>> Hello Josh
>>> Thank you your for reply to us.
>>>
>>> After giving the command in the cluster I got the following error. We
>>> are concerned about user data. Could you kindly confirm this command will
>>> not affect any user data?
>>>
>>> root@ceph-node1:/# ceph osd rm-pg-upmap-primary
>>> Traceback (most recent call last):   File "/usr/bin/ceph", line 1327, in
>>> <module>     retval = main()   File "/usr/bin/ceph", line 1036, in main
>>> retargs = run_in_thread(cluster_handle.conf_parse_argv, childargs)   File
>>> "/usr/lib/python3.6/site-packages/ceph_argparse.py", line 1538, in
>>> run_in_thread     raise t.exception   File
>>> "/usr/lib/python3.6/site-packages/ceph_argparse.py", line 1504, in run
>>> self.retval = self.func(*self.args, **self.kwargs)   File "rados.pyx", line
>>> 551, in rados.Rados.conf_parse_argv   File "rados.pyx", line 314, in
>>> rados.cstr_list   File "rados.pyx", line 308, in rados.cstr
>>> UnicodeEncodeError: 'utf-8' codec can't encode characters in position 3-4:
>>> surrogates
>>>
>>> Apart from that, do you need any others information?
>>>
>>>
>>> Regards
>>> Mosharaf Hossain
>>> Manager, Product Development
>>> IT Division
>>> Bangladesh Export Import Company Ltd.
>>>
>>> On Thu, Sep 14, 2023 at 1:52 PM Josh Salomon <jsalomon@xxxxxxxxxx>
>>> wrote:
>>>
>>>> Hi Mosharaf,
>>>>
>>>> If you undo the read balancing commands (using the command 'ceph
>>>> osd rm-pg-upmap-primary' on all pgs in the pool) do you see improvements in
>>>> the performance?
>>>>
>>>> Regards,
>>>>
>>>> Josh
>>>>
>>>>
>>>> On Thu, Sep 14, 2023 at 12:35 AM Laura Flores <lflores@xxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi Mosharaf,
>>>>>
>>>>> Can you please create a tracker issue and attach a copy of your
>>>>> osdmap? Also, please include any other output that characterizes the
>>>>> slowdown in client I/O operations you're noticing in your cluster. I can
>>>>> take a look once I have that information,
>>>>>
>>>>> Thanks,
>>>>> Laura
>>>>>
>>>>> On Wed, Sep 13, 2023 at 5:23 AM Mosharaf Hossain <
>>>>> mosharaf.hossain@xxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> Hello Folks
>>>>>> We've recently performed an upgrade on our Cephadm cluster,
>>>>>> transitioning
>>>>>> from Ceph Quiency to Reef. However, following the manual
>>>>>> implementation of
>>>>>> a read balancer in the Reef cluster, we've experienced a significant
>>>>>> slowdown in client I/O operations within the Ceph cluster, affecting
>>>>>> both
>>>>>> client bandwidth and overall cluster performance.
>>>>>>
>>>>>> This slowdown has resulted in unresponsiveness across all virtual
>>>>>> machines
>>>>>> within the cluster, despite the fact that the cluster exclusively
>>>>>> utilizes
>>>>>> SSD storage."
>>>>>>
>>>>>> Kindly guide us to move forward.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Mosharaf Hossain
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Laura Flores
>>>>>
>>>>> She/Her/Hers
>>>>>
>>>>> Software Engineer, Ceph Storage <https://ceph.io>
>>>>>
>>>>> Chicago, IL
>>>>>
>>>>> lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx>
>>>>> M: +17087388804
>>>>>
>>>>>
>>>>>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage <https://ceph.io>

Chicago, IL

lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx>
M: +17087388804
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx