Re: OSD log being spammed with BlueStore stupidallocator dump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don't know anything about the BlueStore code, but given the snippets you've posted this appears to be a debug thing that doesn't expect to be invoked (or perhaps only in an unexpected case that it's trying hard to recover from). Have you checked where the dump() function is invoked from? I'd imagine it's something about having to try extra-hard to allocate free space or something.
-Greg

On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander <wido@xxxxxxxx> wrote:


On 10/11/2018 12:08 AM, Wido den Hollander wrote:
> Hi,
>
> On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
> seeing OSDs writing heavily to their logfiles spitting out these lines:
>
>
> 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd2078000~34000
> 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd22cc000~24000
> 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd2300000~20000
> 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd2324000~24000
> 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd26c0000~24000
> 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd2704000~30000
>
> It goes so fast that the OS-disk in this case can't keep up and become
> 100% util.
>
> This causes the OSD to slow down and cause slow requests and starts to flap.
>

I've set 'log_file' to /dev/null for now, but that doesn't solve it
either. Randomly OSDs just start spitting out slow requests and have
these issues.

Any suggestions on how to fix this?

Wido

> It seems that this is *only* happening on OSDs which are the fullest
> (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
> that's high).
>
> Looking at StupidAllocator.cc I see this piece of code:
>
> void StupidAllocator::dump()
> {
>   std::lock_guard<std::mutex> l(lock);
>   for (unsigned bin = 0; bin < free.size(); ++bin) {
>     ldout(cct, 0) << __func__ << " free bin " << bin << ": "
>                   << free[bin].num_intervals() << " extents" << dendl;
>     for (auto p = free[bin].begin();
>          p != free[bin].end();
>          ++p) {
>       ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
> << "~"
>                     << p.get_len() << std::dec << dendl;
>     }
>   }
> }
>
> I'm just wondering why it would spit out these lines and what's causing it.
>
> Has anybody seen this before?
>
> Wido
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux