| | renjianxinlover | | renjianxinlover@xxxxxxx | On 6/14/2022 13:21,renjianxinlover<renjianxinlover@xxxxxxx> wrote: Ceph version: v12.2.10 OS Destribution: debian9 Kernel Release & Version: 4.9.0-18-amd64 #1 SMP Debian 4.9.303-1 (2022-03-07) x86_64 GNU/Linux But, building ceph failed, error snippet looks like ... [ 33%] Built target osdc Scanning dependencies of target librados_api_obj [ 33%] Building CXX object src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/EventSelect.cc.o [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/Stack.cc.o [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/PosixStack.cc.o [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/net_handler.cc.o /mnt/ceph-source-code/ceph/src/librados/librados.cc: In static member function ‘static librados::AioCompletion* librados::Rados::aio_create_completion(void*, librados::callback_t, librados::callback_t)’: /mnt/ceph-source-code/ceph/src/librados/librados.cc:2747:7: warning: unused variable ‘r’ [-Wunused-variable] int r = rados_aio_create_completion(cb_arg, cb_complete, cb_safe, (void**)&c); ^ In file included from /mnt/ceph-source-code/ceph/src/include/Context.h:19:0, from /mnt/ceph-source-code/ceph/src/common/Cond.h:19, from /mnt/ceph-source-code/ceph/src/librados/AioCompletionImpl.h:18, from /mnt/ceph-source-code/ceph/src/librados/librados.cc:29: /mnt/ceph-source-code/ceph/src/librados/librados.cc: In function ‘int rados_conf_read_file(rados_t, const char*)’: /mnt/ceph-source-code/ceph/src/common/dout.h:80:12: error: base operand of ‘->’ is not a pointer _ASSERT_H->_log->submit_entry(_dout_e); \ ^ /mnt/ceph-source-code/ceph/src/common/dout.h:80:12: note: in definition of macro ‘dendl_impl’ _ASSERT_H->_log->submit_entry(_dout_e); \ ^~ /mnt/ceph-source-code/ceph/src/librados/librados.cc:2897:47: note: in expansion of macro ‘dendl’ lderr(client->cct) cct) << warnings.str() << dendl; ^ [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/QueueStrategy.cc.o [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/rdma/Infiniband.cc.o [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/rdma/RDMAConnectedSocketImpl.cc.o src/librados/CMakeFiles/librados_api_obj.dir/build.make:62: recipe for target 'src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o' failed make[3]: *** [src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o] Error 1 CMakeFiles/Makefile2:3814: recipe for target 'src/librados/CMakeFiles/librados_api_obj.dir/all' failed make[2]: *** [src/librados/CMakeFiles/librados_api_obj.dir/all] Error 2 make[2]: *** Waiting for unfinished jobs.... [ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/rdma/RDMAServerSocketImpl.cc.o ... Brs | | renjianxinlover | | renjianxinlover@xxxxxxx | On 6/14/2022 06:06,<ceph-users-request@xxxxxxx> wrote: Send ceph-users mailing list submissions to ceph-users@xxxxxxx To subscribe or unsubscribe via email, send a message with subject or body 'help' to ceph-users-request@xxxxxxx You can reach the person managing the list at ceph-users-owner@xxxxxxx When replying, please edit your Subject line so it is more specific than "Re: Contents of ceph-users digest..." Today's Topics: 1. Re: something wrong with my monitor database ? (Stefan Kooman) 2. Changes to Crush Weight Causing Degraded PGs instead of Remapped (Wesley Dillingham) 3. Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped (Eugen Block) 4. Re: My cluster is down. Two osd:s on different hosts uses all memory on boot and then crashes. (Stefan) 5. Copying and renaming pools (Pardhiv Karri) 6. Ceph Octopus RGW - files vanished from rados while still in bucket index (Boris Behrens) ---------------------------------------------------------------------- Message: 1 Date: Mon, 13 Jun 2022 18:37:59 +0200 From: Stefan Kooman <stefan@xxxxxx> Subject: Re: something wrong with my monitor database ? To: Eric Le Lay <eric.lelay@xxxxxxxxxxxxx>, ceph-users@xxxxxxx Message-ID: <c99bfd94-086e-f9dd-5b8a-6a19e57cc441@xxxxxx> Content-Type: text/plain; charset=UTF-8; format=flowed On 6/13/22 18:21, Eric Le Lay wrote: Those objects are deleted but have snapshots, even if the pool itself doesn't have snapshots. What could cause that? root@hpc1a:~# rados -p storage stat rbd_data.5b423b48a4643f.000000000006a4e5 error stat-ing storage/rbd_data.5b423b48a4643f.000000000006a4e5: (2) No such file or directory root@hpc1a:~# rados -p storage lssnap 0 snaps root@hpc1a:~# rados -p storage listsnaps rbd_data.5b423b48a4643f.000000000006a4e5 rbd_data.5b423b48a4643f.000000000006a4e5: cloneid snaps size overlap 1160 1160 4194304 [1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384] 1364 1364 4194304 [] Do the OSDs still need to trim the snapshots? Does data usage decline over time? Gr. Stefan ------------------------------ Message: 2 Date: Mon, 13 Jun 2022 13:37:32 -0400 From: Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> Subject: Changes to Crush Weight Causing Degraded PGs instead of Remapped To: ceph-users <ceph-users@xxxxxxx> Message-ID: <CAJ10viL1QXp3YshJTu8_xMLxHRXMTwzbs4JYbqTSm-fE3QEE0g@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="UTF-8" I have a brand new Cluster 16.2.9 running bluestore with 0 client activity. I am modifying some crush weights to move PGs off of a host for testing purposes but the result is that the PGs go into a degraded+remapped state instead of simply a remapped state. This is a strange result to me as in previous releases (nautilus) this would cause only Remapped PGs. Are there any known issues around this? Are others running Pacific seeing similar behavior? Thanks. "ceph osd crush reweight osd.1 0" ^ Causes degraded PGs which then go into recovery. Expect only remapped PGs Respectfully, *Wes Dillingham* wes@xxxxxxxxxxxxxxxxx LinkedIn <http://www.linkedin.com/in/wesleydillingham> ------------------------------ Message: 3 Date: Mon, 13 Jun 2022 20:46:49 +0000 From: Eugen Block <eblock@xxxxxx> Subject: Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped To: ceph-users@xxxxxxx Message-ID: <20220613204649.Horde.X5vfBCT_BSm9X_D7tN7gRe7@xxxxxxxxxxxxxx> Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes I remember someone reporting the same thing but I can’t find the thread right now. I’ll try again tomorrow. Zitat von Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx>: I have a brand new Cluster 16.2.9 running bluestore with 0 client activity. I am modifying some crush weights to move PGs off of a host for testing purposes but the result is that the PGs go into a degraded+remapped state instead of simply a remapped state. This is a strange result to me as in previous releases (nautilus) this would cause only Remapped PGs. Are there any known issues around this? Are others running Pacific seeing similar behavior? Thanks. "ceph osd crush reweight osd.1 0" ^ Causes degraded PGs which then go into recovery. Expect only remapped PGs Respectfully, *Wes Dillingham* wes@xxxxxxxxxxxxxxxxx LinkedIn <http://www.linkedin.com/in/wesleydillingham> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx ------------------------------ Message: 4 Date: Mon, 13 Jun 2022 20:55:41 +0000 From: Stefan <slissm@xxxxxxxxxxxxxx> Subject: Re: My cluster is down. Two osd:s on different hosts uses all memory on boot and then crashes. To: Mara Sophie Grosch <littlefox@xxxxxxxxxx> Cc: ceph-users@xxxxxxx Message-ID: <i11WTn1qURiy-9dgtyNZihX_CqBDtHWFUXklOmQajFNOspxabzrYbZGkt KM3xaV5KFp1kCXOgu8jair2AG6clFQ2kqr55bCoGzrlIL3-tZ0=@protonmail.com> Content-Type: text/plain; charset=utf-8 Hello Mara, Thank you so much, you are a lifesaver! I'm not very skilled at docker, normally just use docker containers with provided docker run commands. So it took some time before I was able to run the command inside the container, and have the container access the ceph osd disk. But after some trail and error I managed to fix everything and now my cluster is healthy again! Again, thank you! I also want to take the opportunity to thank everyone else in the ceph community for a great project! Best regards Stefan Lissmats Sent with Proton Mail secure email. ------- Original Message ------- On Monday, June 13th, 2022 at 4:42 PM, Mara Sophie Grosch <littlefox@xxxxxxxxxx> wrote: Hi, as someone who has gone through that just last week, that sounds a lot like the symptoms of my cluster. In case you are comfortable with docker (or any other container runtime), I have pushed an image [1] with quincy from a few days ago, the fix for pglog dups being included in that and was able to successfully clean my OSD with the ceph-objectstore-tool in it. Something like `CEPH_ARGS="--osd_pg_log_trim_max=50000 --osd_max_pg_log_entries=2000 ceph-objectstore-tool --data-path $osd_path --op trim-pg-log` should help (command mostly from memory, check it before executing it - as always). Best of luck, Mara [1] littlefox/ceph-daemon-base:2, based on commit 5d47b8e21e77a57e51781f00021f77c7967ebbe2 Am Mon, Jun 13, 2022 at 02:10:42PM +0000 schrieb Stefan: Hello, I have been running Ceph for several years and everything has been rock solid until this weekend. Due to some unfortune events my cluster at home is down. I have two osd:s that don't boot and the reason seems to be this issue: https://tracker.ceph.com/issues/53729 I'm currently running version 17.2.0, but when i hit the issue I was on 16.2.7. In an attempt to fix the issue i upgraded first to 16.2.9 and then to 17.2.0, but it didn't help. I also tried giving it a huge swap. But it ended up krashing anyway. 1. There seems to be a fix for the issue in a github branch. https://github.com/NitzanMordhai/ceph/tree/wip-nitzan-pglog-dups-not-trimmed/ I don't have very advanced Ceph/Linux skills and i'm not 100% that i understand exacly how I should use it. Do I need to compile a complete Ceph installation and run that or can i pinpoint ceph-objectstore-tool in some way to only compile and run that? 2. The issue seems to be targeted for release in 17.2.1, is there any information when that will be released? Any advice would be very welcome since i was running a lot of different VM:s and didn't have all backed up. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx ------------------------------ Message: 5 Date: Mon, 13 Jun 2022 14:15:40 -0700 From: Pardhiv Karri <meher4india@xxxxxxxxx> Subject: Copying and renaming pools To: ceph-users <ceph-users@xxxxxxx> Message-ID: <CAFtu09Anr7abO+Dx2ZYb7dirzqEEsti5ySuKydGqVUimg_AsKw@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="UTF-8" Hi, Our Ceph is used as backend storage for Openstack. We use the "images" pool for glance and the "compute" pool for instances. We need to migrate our images pool which is on HDD drives to SSD drives. I copied all the data from the "images" pool that is on HDD disks to an "ssdimages" pool that is on SSD disks, made sure the crush rules are all good. I used "rbd deep copy" to migrate all the objects. Then I renamed the pools, "images" to "hddimages" and "ssdimages" to "images". Our Openstack instances are on the "compute" pool. All the instances that are created using the image show the parent as an image from the images pool. I thought renaming would point to the new pool that is on SSD disks with renamed as "images" but now interestingly all the instances rbd info are now pointing to the parent "hddimages". How to make sure the parent pointers stay as "images" only instead of modifying to "hddimages"? Before renaming pools: lab [root@ctl01 /]# rbd info compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk': size 100GiB in 12800 objects order 23 (8MiB objects) block_name_prefix: rbd_data.8f51c347398c89 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Mar 15 21:36:55 2022 parent: images/909e6734-6f84-466a-b2fa-487b73a1f50a@snap overlap: 10GiB lab [root@ctl01 /]# After renaming pools, the parent value autoamitclaly gets modified: lab [root@ctl01 /]# rbd info compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk': size 100GiB in 12800 objects order 23 (8MiB objects) block_name_prefix: rbd_data.8f51c347398c89 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Mar 15 21:36:55 2022 parent: hddimages/909e6734-6f84-466a-b2fa-487b73a1f50a@snap overlap: 10GiB lab [root@ctl01 /]# Thanks, Pardhiv ------------------------------ Message: 6 Date: Tue, 14 Jun 2022 00:05:20 +0200 From: Boris Behrens <bb@xxxxxxxxx> Subject: Ceph Octopus RGW - files vanished from rados while still in bucket index To: ceph-users@xxxxxxx Message-ID: <CAHjiyFQQbLOSu2APeJiuqQfhhENCeSP1xLA5GRNX1WHkuvA3CQ@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="UTF-8" Hi everybody, are there other ways for rados objects to get removed, other than "rados -p POOL rm OBJECT"? We have a customer who got objects in the bucket index, but can't download it. After checking it seems like the rados object is gone. Ceph cluster is running ceph octopus 15.2.16 "radosgw-admin bi list --bucket BUCKET" shows the object available. "radosgw-admin bucket radoslist --bucket BUCKET" shows the object and a corresponding multipart file. "rados -p POOL ls" only shows the object, but not the multipart file. Exporting the rados object hands me an empty file. I find it very strange that a 250KB file get a multipart object, but what do I know how the customer uploaded the file and how they work with the RGW api. What grinds my gears is that we lost customer data, and I need to know what ways are there that leads to said problem. I know there is no recovery, but I am not satisfied with "well, it just happened. No idea why". As I am the only one who is working on the the ceph cluster I would remove "removed via rados command" from the list of possibilities, as the last orphan objects cleanup was performed a month before the files last MTIME. Is there ANY way this could happen in some correlation with the GC, restarting/adding/removing OSDs, sharding bucket indexes, OSD crashes and other? Anything that isn't "rados -p POOL rm OBJECT"? Cheers Boris ------------------------------ Subject: Digest Footer _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s ------------------------------ End of ceph-users Digest, Vol 113, Issue 36 ******************************************* _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx