Re: Moving cluster log storage from monstore db

Prashant Dhange <pdhange@xxxxxxxxxx> · Mon, 27 Mar 2023 23:05:05 -0700

Thanks Matthias.

On Wed, Mar 22, 2023 at 1:51 PM Matthias Muench <mmuench@xxxxxxxxxx> wrote:

    Hi Prashant, et. al.,

    separating the logs from the DB might be a good thing. 

    I would second what Frank suggested: local storage. Local to the mon
    instances hosts, perhaps just saying that flash is required which
    shouldn't be an issue nowadays. This would also give the best
    latency to avoid starvation on IOPS in case of the disaster. 

Yes, we can achieve this but maybe instead of mon handling these logs we can delegate this task to mgr daemon. 

With
    redundancy in the instances, data is available, at least from one of
    the mon instance hosts. Relying on pools would assume that
    communication is intact even between the actors of the pool. An
    exclusive pool for just this only would still depend on the network
    connection and introducing additional latency, too.

Rightly said.

    The other alternatives sound promising as well, however, I would
    like to raise some concerns.

    Pushing the logs only to a central location would impose a
    dependency on this location in case of a disaster. A disaster could
    be also in conjunction with a network issue affecting the connection
    to outside world. So, might be an add-on but for troubleshooting
    rather some kind of additional challenge. 

    Eventually consistent distribution of data might be hard for
    troubleshooting. The basic assumption would be that the logs aren't
    that important to be available in full in some of the places, as in
    the different mon instance hosts. Eventual consistency also would
    add another level of trouble to troubleshoot in conjunction with a
    disaster. Those interconnection requirements may be void or at least
    the service may be at limited availability that might not help to
    get the data into the place just in need.

Yes, it will be SPOF for log availability if we log to a central location. We will consider these inputs. Thanks for your inputs.

    Kind regards,

    -matt

    On 22.03.23 14:10, Ernesto Puerta
      wrote:

      Hi Prashant,

        Is this move just limited to the impact of the cluster log
          in the mon store db or is it part of a larger mon db clean-up
          effort?

        I'm asking this because, besides de cluster log, the mon
          store db is currently used (and perhaps abused) also by some
          mgr modules via:

            set_module_option() to set
              MODULE_OPTIONS values via CLI commands.
            set_store(): there are 2 main
              storage use cases here:

              Immutable/sensitive data: instead of exposing
                those as MODULE_OPTIONS (password hashes, private
                certificates, API keys, etc.),
              Changing data: mgr-module internal state. While
                this shouldn't cause the db to grow in the long term, it
                might cause short-term/compaction issues (I'm not
                familiar with rocksdb internals, just extrapolating from
                experience with sstable/leveldb)

          For the latter case there, Dashboard developers have been
            looking for an efficient alternative to persistently store
            rapidly-changing data. We discarded the idea of using a pool
            since the Dashboard should be able to operate prior to any
            OSD provisioning and in case of storage downtimes

          Coming back to your original questions, I understand that
            there are two different issues at stake:

              Cluster log processing: currently mon via Paxos
                (Do we really need Paxos ack for logs? Can we live with
                some type of eventually-consistent/best-effort storage
                here?)
              Cluster log storage: currently mon store db.
                AFAIK this is the main issue, right?

          From there, I see 2 possible paths:

              Keep cluster-wide logs as a Ceph concern:

                IMHO putting some throttling in place should be a
                  must, since client-triggered cluster logs could easily
                  become a DoS vector.
                I wouldn't put them into a rados pool, not so much
                  for the data availability in case of OSD service
                  downtime (logs will still be recoverable
                  from logfiles), but as for the potential interference
                  with user workloads/deployment patterns (as Frank
                  mentioned before).

                  Could we run the ".mgr" pool on a new type of
                    "internal/service-only" colocated OSDs (memstore)?

                Save logs to a fixed-size/TTL-bound priority or
                  multi-level queue structure?
                Add some (eventually-consistent) store db to the
                  ceph-mgr?
                To solve ceph-mgr scalability issues, we recently
                  added a new kind of Ceph utility daemon
                  (ceph-exporter) whose sole purpose is to fetch metrics
                  from co-located Ceph daemon's perf-counters and make
                  those available for Prometheus scraping. We could
                  think about a similar thing but for logs... (although
                  it'd be very similar to the Loki approach below).

              Move them outside Ceph:

                Cephadm + Dashboard now
                    support Centralized Logging via Loki + Promtail,
                  which basically polls all daemon logfiles and sends
                  new log traces to a central service (Loki) where they
                  can be monitored/filtered in real-time.

                  If we find the previous solution too bulky for
                    regular cluster monitoring, we could explore systemd-journal-remote/rsyslog/...

                The main downside of this approach is that it might
                  break the "ceph log" command (rados_monitor_log and
                  log events could still be watched I guess).

                Kind Regards,
                Ernesto

        On Wed, Mar 22, 2023 at
          11:12 AM Janne Johansson <icepic.dz@xxxxxxxxx>
          wrote:

        >
          2) .mgr pool

          >

          > 2.1) I have become really tired of these administrative
          pools that are created on the fly without any regards to
          device classes, available capacity, PG allocation and the
          like. The first one that showed up without warning was
          device_health_metrics, which turned the cluster health_err
          right away because the on-the-fly pool creation is, well, not
          exactly smart.

          >

          > We don't even have drives below the default root. We have
          a lot of different pools on different (custom!) device classes
          with different replication schemes to accommodate a large
          variety of use cases. Administrative pools showing up randomly
          somewhere in the tree are a real pain. There are ceph-user
          cases where people deleted and recreated it only to make the
          device health module useless, because it seems to store the
          pool ID and there is no way to tell it to use the new pool.

          >

          Ah, that's why it looked unused after I also had to remake it.
          Since

          it gets created when you don't have the OSDs yet, the
          possibilities

          for it ending up wrong seem very large.

          -- 

          May the most significant bit of your life be positive.

          _______________________________________________

          Dev mailing list -- dev@xxxxxxx

          To unsubscribe send an email to dev-leave@xxxxxxx

      _______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

    -- 
——————————————————
Matthias Muench
Principal Specialist Solution Architect
EMEA Storage Specialist
matthias.muench@xxxxxxxxxx
Phone: +49-160-92654111

Red Hat GmbH
Technopark II
Werner-von-Siemens-Ring 12
85630 Grasbrunn
Germany
_______________________________________________________________________
Red Hat GmbH, Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany  
Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx