Re: Potential OSD deadlock?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 22, 2015 at 7:24 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Is there some way to tell in the logs that this is happening?

You can search for the (mangled) name _split_collection
> I'm not
> seeing much I/O, CPU usage during these times. Is there some way to
> prevent the splitting? Is there a negative side effect to doing so?

Bump up the split and merge thresholds. You can search the list for
this, it was discussed not too long ago.

> We've had I/O block for over 900 seconds and as soon as the sessions
> are aborted, they are reestablished and complete immediately.
>
> The fio test is just a seq write, starting it over (rewriting from the
> beginning) is still causing the issue. I was suspect that it is not
> having to create new file and therefore split collections. This is on
> my test cluster with no other load.

Hmm, that does make it seem less likely if you're really not creating
new objects, if you're actually running fio in such a way that it's
not allocating new FS blocks (this is probably hard to set up?).

>
> I'll be doing a lot of testing today. Which log options and depths
> would be the most helpful for tracking this issue down?

If you want to go log diving "debug osd = 20", "debug filestore = 20",
"debug ms = 1" are what the OSD guys like to see. That should spit out
everything you need to track exactly what each Op is doing.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux