Thorne, That's why I asked you to create a separate pool. All writes go to the original pool, and it is possible to see object counts per-pool. On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler <thorne@xxxxxxxxxxx> wrote: > Alexander, > > Thank you, but as I said to Igor: The 5.5TB of files on this filesystem > are virtual machine disks. They are under constant, heavy write load. There > is no way to turn this off. > On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote: > > Hello Thorne, > > Here is one more suggestion on how to debug this. Right now, there is > uncertainty on whether there is really a disk space leak or if > something simply wrote new data during the test. > > If you have at least three OSDs you can reassign, please set their > CRUSH device class to something different than before. E.g., "test". > Then, create a new pool that targets this device class and add it to > CephFS. Then, create an empty directory on CephFS and assign this pool > to it using setfattr. Finally, try reproducing the issue using only > files in this directory. This way, you will be sure that nobody else > is writing any data to the new pool. > > On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov <igor.fedotov@xxxxxxxx> <igor.fedotov@xxxxxxxx> wrote: > > Hi Thorn, > > given the amount of files at CephFS volume I presume you don't have > severe write load against it. Is that correct? > > If so we can assume that the numbers you're sharing are mostly refer to > your experiment. At peak I can see bytes_used increase = 629,461,893,120 > bytes (45978612027392 - 45349150134272). With replica factor = 3 this > roughly matches your written data (200GB I presume?). > > > More interestingly is that after file's removal we can see 419,450,880 > bytes delta (=45349569585152 - 45349150134272). I could see two options > (apart that someone else wrote additional stuff to CephFS during the > experiment) to explain this: > > 1. File removal wasn't completed at the last probe half an hour after > file's removal. Did you see stale object counter when making that probe? > > 2. Some space is leaking. If that's the case this could be a reason for > your issue if huge(?) files at CephFS are created/removed periodically. > So if we're certain that the leak really occurred (and option 1. above > isn't the case) it makes sense to run more experiments with > writing/removing a bunch of huge files to the volume to confirm space > leakage. > > On 3/18/2024 3:12 AM, Thorne Lawler wrote: > > Thanks Igor, > > I have tried that, and the number of objects and bytes_used took a > long time to drop, but they seem to have dropped back to almost the > original level: > > * Before creating the file: > o 3885835 objects > o 45349150134272 bytes_used > * After creating the file: > o 3931663 objects > o 45924147249152 bytes_used > * Immediately after deleting the file: > o 3935995 objects > o 45978612027392 bytes_used > * Half an hour after deleting the file: > o 3886013 objects > o 45349569585152 bytes_used > > Unfortunately, this is all production infrastructure, so there is > always other activity taking place. > > What tools are there to visually inspect the object map and see how it > relates to the filesystem? > > > Not sure if there is anything like that at CephFS level but you can use > rados tool to view objects in cephfs data pool and try to build some > mapping between them and CephFS file list. Could be a bit tricky though. > > On 15/03/2024 7:18 pm, Igor Fedotov wrote: > > ceph df detail --format json-pretty > > -- > > Regards, > > Thorne Lawler - Senior System Administrator > *DDNS* | ABN 76 088 607 265 > First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172 > P +61 499 449 170 > > _DDNS > > /_*Please note:* The information contained in this email message and > any attached files may be confidential information, and may also be > the subject of legal professional privilege. _If you are not the > intended recipient any use, disclosure or copying of this email is > unauthorised. _If you received this email in error, please notify > Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this > matter and delete all copies of this transmission together with any > attachments. / > > > -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us athttps://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web:https://croit.io | YouTube:https://goo.gl/PGE1Bx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- > > Regards, > > Thorne Lawler - Senior System Administrator > *DDNS* | ABN 76 088 607 265 > First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172 > P +61 499 449 170 > > [image: DDNS] > *Please note: The information contained in this email message and any > attached files may be confidential information, and may also be the subject > of legal professional privilege. If you are not the intended recipient any > use, disclosure or copying of this email is unauthorised. If you received > this email in error, please notify Discount Domain Name Services Pty Ltd on > 03 9815 6868 to report this matter and delete all copies of this > transmission together with any attachments.* > > > -- Alexander E. Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx