Re: Adding a new monitor fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks,

I'll have to see if I come up with a suitable issue on documentation.
My biggest issue isn't a specific item (well, except for Octopus
telling me to use the not-included ceph-deploy command in lots of
places). It's more a case of needing attention paid to anachronisms in
general.

That and more attention could be paid to the distinction between
container-based and OS-native Ceph components.

So in short, not single issues, but more of a need for attention to the
overall details to assure that features described for a specific
release actually apply TO that release. Grunt work, but it can save a
lot on service calls.

I migrated to ceph from gluster because gluster is apparently going
unsupported at the end of this year. I moved to gluster from DR/BD
because I wanted triple redundancy on the data. While ceph is really
kind of overkill for my small R&D farm, it has proven to be about the
most solid network distributed filesystem I've worked with, No split
brains, no outright corruption, no data outages. Despite all the
atrocities I committed in setting it up, it has never failed at it
primary duty of delivering data service.

I started off with Octopus, and that has been the root of a lot of my
problems. Octopus introduced cephadm as a primary management tool, I
believe, but the documentation still referenced ceph-deploy. And
cephadm suffered from a bug that meant that if even one service was
down, scheduled work would not be done, so to repair anything I needed
an already-repaired system.

Migrating to Pacific cleared that up so a lot of what I'm doing now is
getting the lint out. I'm now staying consistently healthy between a
proper monitor configuration and having removed direct ceph mounts on
the desktops.

I very much appreciate all the help and insights you've provided. It's
nice to have laid my problems to rest.

   Tim

On Thu, 2024-02-08 at 14:41 +0000, Eugen Block wrote:
> Hi,
> 
> you're always welcome to report a documentation issue on  
> tracker.ceph.com, you don't need to clean them up by yourself. :-)
> There is a major restructuring in progress, but they will probably  
> never be perfect anyway.
> 
> > There are definitely some warts om there, as the monitor count was
> > 1
> > but there were 2 monitors listed running.
> 
> I don't know your mon history, but I assume that you've had more
> than  
> one mon (before converting to cephadm?). Then you might have updated 
> the mon specs via command line, containing "count:1". But the mgr  
> refuses to remove the second mon because it would break quorum.
> That's  
> why you had 2/1 running, this is reproducible in my test cluster.
> Adding more mons also failed because of the count:1 spec. You could  
> have just overwritten it in the cli as well without a yaml spec file 
> (omit the count spec):
> 
> ceph orch apply mon --placement="host1,host2,host3"
> 
> Regards,
> Eugen
> 
> Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> 
> > Ah, yes. Much better.
> > 
> > There are definitely some warts om there, as the monitor count was
> > 1
> > but there were 2 monitors listed running.
> > 
> > I've mostly avoided docs that reference ceph config files and yaml
> > configs because the online docs are (as I've whined before) not
> > always
> > trustworthy and often contain anachronisms. Were I sufficiently
> > knowledgeable, I'd offer to clean them up, but if that were the
> > case, I
> > wouldn't have to come crying here.
> > 
> > All happy now, though.
> > 
> >    Tim
> > 
> > 
> > On Tue, 2024-02-06 at 19:22 +0000, Eugen Block wrote:
> > > Yeah, you have the „count:1“ in there, that’s why your manually
> > > added 
> > > daemons are rejected. Try my suggestion with a mon.yaml.
> > > 
> > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > 
> > > > ceph orch ls
> > > > NAME                               PORTS        RUNNING 
> > > > REFRESHED 
> > > > AGE
> > > > PLACEMENT
> > > > alertmanager                       ?:9093,9094      1/1  3m
> > > > ago    
> > > > 8M
> > > > count:1
> > > > crash                                               5/5  3m
> > > > ago    
> > > > 8M
> > > > *
> > > > grafana                            ?:3000           1/1  3m
> > > > ago    
> > > > 8M
> > > > count:1
> > > > mds.ceefs                                           2/2  3m
> > > > ago    
> > > > 4M
> > > > count:2
> > > > mds.fs_name                                         3/3  3m
> > > > ago    
> > > > 8M
> > > > count:3
> > > > mgr                                                 3/3  3m
> > > > ago    
> > > > 4M
> > > > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com
> > > > mon                                                 2/1  3m
> > > > ago    
> > > > 4M
> > > > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com;count:
> > > > 1
> > > > nfs.foo                            ?:2049           1/1  3m
> > > > ago    
> > > > 4M
> > > > www7.mousetech.com
> > > > node-exporter                      ?:9100           5/5  3m
> > > > ago    
> > > > 8M
> > > > *
> > > > osd                                                   6  3m
> > > > ago    
> > > > -
> > > > <unmanaged>
> > > > osd.dashboard-admin-1686941775231                     0  -
> > > >          
> > > > 7M
> > > > *
> > > > prometheus                         ?:9095           1/1  3m
> > > > ago    
> > > > 8M
> > > > count:1
> > > > rgw.mousetech                      ?:80             2/2  3m
> > > > ago    
> > > > 3M
> > > > www7.mousetech.com;www2.mousetech.com
> > > > 
> > > > 
> > > > Note that the dell02 monitor doesn't show here although the
> > > > "ceph
> > > > orch
> > > > deamon add" returns success initially. And actually the www6
> > > > monitor is
> > > > not running nor does it list on the dashboard or "ceph orch
> > > > ps".
> > > > The
> > > > www6 machine is still somewhat messed up because it was the
> > > > initial
> > > > launch machine for Octopus.
> > > > 
> > > > On Tue, 2024-02-06 at 17:22 +0000, Eugen Block wrote:
> > > > > So the orchestrator is working and you have a working ceph
> > > > > cluster? 
> > > > > Can you share the output of:
> > > > > ceph orch ls mon
> > > > > 
> > > > > If the orchestrator expects only one mon and you deploy
> > > > > another 
> > > > > manually via daemon add it can be removed. Try using a
> > > > > mon.yaml
> > > > > file 
> > > > > instead which contains the designated mon hosts and then run
> > > > > ceph orch apply -I mon.yaml
> > > > > 
> > > > > 
> > > > > 
> > > > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > > > 
> > > > > > I just jacked in a completely new, clean server and I've
> > > > > > been
> > > > > > trying to
> > > > > > get a Ceph (Pacific) monitor running on it.
> > > > > > 
> > > > > > The "ceph orch daemon add" appears to install all/most of
> > > > > > what's
> > > > > > necessary, but when the monitor starts, it shuts down
> > > > > > immediately,
> > > > > > and
> > > > > > in the manner of Ceph containers immediately erases itself
> > > > > > and
> > > > > > the
> > > > > > container log, so it's not possible to see what its problem
> > > > > > is.
> > > > > > 
> > > > > > I looked at manual installation, but the docs appear to be
> > > > > > oriented
> > > > > > towards old-style non-container implementation and don't
> > > > > > account
> > > > > > for
> > > > > > the newer /var/lib/ceph/*fsid*/ approach.
> > > > > > 
> > > > > > Any tips?
> > > > > > 
> > > > > > Last few lines in the system journal are like this:
> > > > > > 
> > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:58.938+0000
> > > > > > 7f26810ae700  4 rocksdb: (Original Log Time 2024/02/06-
> > > > > > 16:09:58.938432)
> > > > > > [compaction/compaction_job.cc:760] [default] compacted to:
> > > > > > base
> > > > > > level 6
> > > > > > level multiplier 10.00 max bytes base 268435456 files[0 0 0
> > > > > > 0 0
> > > > > > 0
> > > > > > 2]
> > > > > > max score 0.00, MB/sec: 351.7 rd, 351.7 wr, level 6, files
> > > > > > in(4, 0)
> > > > > > out(2) MB in(92.8, 0.0) out(92.8), read-write-amplify(2.0)
> > > > > > write-
> > > > > > amplify(1.0) OK, records in: 2858, records dropped: 0
> > > > > > output_compression: NoCompression
> > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]:
> > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:58.938+0000
> > > > > > 7f26810ae700  4 rocksdb: (Original Log Time 2024/02/06-
> > > > > > 16:09:58.938452)
> > > > > > EVENT_LOG_v1 {"time_micros": 1707235798938446, "job": 6,
> > > > > > "event":
> > > > > > "compaction_finished", "compaction_time_micros": 276718,
> > > > > > "compaction_time_cpu_micros": 73663, "output_level": 6,
> > > > > > "num_output_files": 2, "total_output_size": 97309398,
> > > > > > "num_input_records": 2858, "num_output_records": 2858,
> > > > > > "num_subcompactions": 1, "output_compression":
> > > > > > "NoCompression",
> > > > > > "num_single_delete_mismatches": 0,
> > > > > > "num_single_delete_fallthrough":
> > > > > > 0,
> > > > > > "lsm_state": [0, 0, 0, 0, 0, 0, 2]}
> > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:58.940+0000
> > > > > > 7f26810ae700  4 rocksdb: EVENT_LOG_v1 {"time_micros":
> > > > > > 1707235798941291,
> > > > > > "job": 6, "event": "table_file_deletion", "file_number":
> > > > > > 14}
> > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:58.943+0000
> > > > > > 7f26810ae700  4 rocksdb: EVENT_LOG_v1 {"time_micros":
> > > > > > 1707235798943980,
> > > > > > "job": 6, "event": "table_file_deletion", "file_number":
> > > > > > 12}
> > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:58.946+0000
> > > > > > 7f26810ae700  4 rocksdb: EVENT_LOG_v1 {"time_micros":
> > > > > > 1707235798946734,
> > > > > > "job": 6, "event": "table_file_deletion", "file_number":
> > > > > > 10}
> > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:58.946+0000
> > > > > > 7f26810ae700  4 rocksdb: EVENT_LOG_v1 {"time_micros":
> > > > > > 1707235798946789,
> > > > > > "job": 6, "event": "table_file_deletion", "file_number": 4}
> > > > > > Feb 06 11:09:59 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:59.450+0000
> > > > > > 7f26818af700 -1 received  signal: Terminated from Kernel (
> > > > > > Could be
> > > > > > generated by pthread_kill(), raise(), abort(), alarm() )
> > > > > > UID: 0
> > > > > > Feb 06 11:09:59 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:59.450+0000
> > > > > > 7f26818af700 -1 mon.dell02@-1(synchronizing) e161 *** Got
> > > > > > Signal
> > > > > > Terminated ***
> > > > > > Feb 06 11:09:59 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:59.450+0000
> > > > > > 7f26818af700  1 mon.dell02@-1(synchronizing) e161 shutdown
> > > > > > Feb 06 11:09:59 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:59.452+0000
> > > > > > 7f2691a95880  4 rocksdb: [db_impl/db_impl.cc:397] Shutdown:
> > > > > > canceling
> > > > > > all background work
> > > > > > Feb 06 11:09:59 dell02.mousetech.com ceph-278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-
> > > > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02-
> > > > > > 06T16:09:59.452+0000
> > > > > > 7f2691a95880  4 rocksdb: [db_impl/db_impl.cc:573] Shutdown
> > > > > > complete
> > > > > > Feb 06 11:09:59 dell02.mousetech.com bash[1357898]: ceph-
> > > > > > 278fcd86-
> > > > > > 0861-
> > > > > > 11ee-a7df-9c5c8e86cf8f-mon-dell02
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > 
> > > 
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux