Re: OSD log being spammed with BlueStore stupidallocator dump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wido,

once you apply the PR you'll probably see the initial error in the log that triggers the dump. Which is most probably the lack of space reported by _balance_bluefs_freespace() function. If so this means that BlueFS rebalance is unable to allocate contiguous 1M chunk at main device to gift to BlueFS. I.e. your main device space is very fragmented.

Unfortunately I don't know any ways to recover from this state but OSD redeployment or data removal.

Upcoming PR that brings an ability for offline BlueFS volume manipulation (https://github.com/ceph/ceph/pull/23103) will probably help to recover from this issue in future by migrating BlueFS data to a new larger DB volume. (targeted for Nautilus, not sure about backporting to Mimic or Luminous).

For now I can suggest the only preventive mean to avoid the case - have large enough space at your standalone DB volume. So that master device isn't used for DB at all or as minimum as possible. Hence no rebalance is needed and no fragmentation is present.

BTW wondering if you have one for your OSDs? How large if so?

Everything above is "IMO", some chances that I missed something..


Thanks,

Igor


On 10/15/2018 10:12 PM, Wido den Hollander wrote:

On 10/15/2018 08:23 PM, Gregory Farnum wrote:
I don't know anything about the BlueStore code, but given the snippets
you've posted this appears to be a debug thing that doesn't expect to be
invoked (or perhaps only in an unexpected case that it's trying hard to
recover from). Have you checked where the dump() function is invoked
from? I'd imagine it's something about having to try extra-hard to
allocate free space or something.
It seems BlueFS that is having a hard time finding free space.

I'm trying this PR now: https://github.com/ceph/ceph/pull/24543

It will stop the spamming, but that's not the root cause. The OSDs in
this case are at max 80% full and they do have a lot of OMAP (RGW
indexes) in them, but that's all.

I'm however not sure why this is happening suddenly in this cluster.

Wido

-Greg

On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander <wido@xxxxxxxx
<mailto:wido@xxxxxxxx>> wrote:



     On 10/11/2018 12:08 AM, Wido den Hollander wrote:
     > Hi,
     >
     > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
     > seeing OSDs writing heavily to their logfiles spitting out these
     lines:
     >
     >
     > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc
     0x0x55828ae047d0
     > dump  0x15cd2078000~34000
     > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
     0x0x55828ae047d0
     > dump  0x15cd22cc000~24000
     > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
     0x0x55828ae047d0
     > dump  0x15cd2300000~20000
     > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc
     0x0x55828ae047d0
     > dump  0x15cd2324000~24000
     > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc
     0x0x55828ae047d0
     > dump  0x15cd26c0000~24000
     > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc
     0x0x55828ae047d0
     > dump  0x15cd2704000~30000
     >
     > It goes so fast that the OS-disk in this case can't keep up and become
     > 100% util.
     >
     > This causes the OSD to slow down and cause slow requests and
     starts to flap.
     >

     I've set 'log_file' to /dev/null for now, but that doesn't solve it
     either. Randomly OSDs just start spitting out slow requests and have
     these issues.

     Any suggestions on how to fix this?

     Wido

     > It seems that this is *only* happening on OSDs which are the fullest
     > (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
     > that's high).
     >
     > Looking at StupidAllocator.cc I see this piece of code:
     >
     > void StupidAllocator::dump()
     > {
     >   std::lock_guard<std::mutex> l(lock);
     >   for (unsigned bin = 0; bin < free.size(); ++bin) {
     >     ldout(cct, 0) << __func__ << " free bin " << bin << ": "
     >                   << free[bin].num_intervals() << " extents" << dendl;
     >     for (auto p = free[bin].begin();
     >          p != free[bin].end();
     >          ++p) {
     >       ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
     > << "~"
     >                     << p.get_len() << std::dec << dendl;
     >     }
     >   }
     > }
     >
     > I'm just wondering why it would spit out these lines and what's
     causing it.
     >
     > Has anybody seen this before?
     >
     > Wido
     > _______________________________________________
     > ceph-users mailing list
     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     >
     _______________________________________________
     ceph-users mailing list
     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux