Den tors 25 jan. 2024 kl 11:57 skrev Henry lol <pub.virtualization@xxxxxxxxx>: > > It's reasonable enough. > actually, I expected the client to have just? thousands of > "PG-to-OSDs" mappings. Yes, but filename to PG is done with a pseudorandom algo. > Nevertheless, it’s so heavy that the client calculates location on > demand, right? Yes, and I guess the client has some kind of algorithm that makes it possible to know that PG 1.a4 should be on OSD 4, 93, 44 but also if 4 is missing, the next candidate would be 51, if 93 isn't up either then 66 would be the next logical OSD to contact for that copy and so on. Since all parts (client, mons, OSDs) have the same code, when osd 4 dies, 51 knows it needs to get a copy from either 93 or 44 and as soon as that copy is made, the PG will stop being active+degraded but might possibly be active+remapped, since it knows it wants to go back to OSD 4 if it comes back with the same size again. > if the client with the outdated map sends a request to the wrong OSD, > then does the OSD handle it somehow through redirection or something? I think it would get told it has the wrong osdmap. > Lastly, not only CRUSH map but also other factors like storage usage > are considered when doing CRUSH? > because it seems that the target OSD set isn’t deterministic given only it. It doesn't take OSD usage into consideration except at creation time or OSD in/out/reweighing (or manual displacements with upmap and so forth), so this is why "ceph df" will tell you a pool has X free space, where X is "smallest free space on the OSDs on which this pool lies, times the number of OSDs". Given the pseudorandom placement of objects to PGs, there is nothing to prevent you from having the worst luck ever and all the objects you create end up on the OSD with least free space. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx