I have not tried this myself, but could it be related to the compress_required_ratio mentioned here? https://books.google.dk/books?id=vuiLDwAAQBAJ&pg=PA80&lpg=PA80 zip-files probably can't be compressed all that much. On Mon, Jan 4, 2021 at 9:44 PM Paul Mezzanini <pfmeec@xxxxxxx> wrote: > I'm using rsync so I can have it copy times/permissions/acl's etc easier. > It also has an output that's one line per file and informative. > > Actual copy line: > rsync --owner --group --links --hard-links --perms --times --acls > --itemize-changes "${DIRNAME}/${FILENAME}" "${DIRNAME}/.${FILENAME}.copying" > > > It makes a new file unless it is a link to another file (which it > shouldn't be because the find command I used to generate the list excluded > them) > > [pfmeec@gung testing]$ ls -il SMT_X11AST2500_164.zip ; sudo ../wiggler.sh > /home/pfmeec/testing/SMT_X11AST2500_164.zip ; ls -il SMT_X11AST2500_164.zip > 1101787638344 -rw-r--r--. 1 pfmeec staff 27831340 Jan 4 15:34 > SMT_X11AST2500_164.zip > >f+++++++++ SMT_X11AST2500_164.zip > 1101787638345 -rw-r--r--. 1 pfmeec staff 27831340 Jan 4 15:34 > SMT_X11AST2500_164.zip > [pfmeec@gung testing]$ > > > It does have a new inode number, but it feels suspect that the number is > only one digit higher. Probably largely because I did several runs in a row > to verify and it was just the next inode handed out. > > > -- > Paul Mezzanini > Sr Systems Administrator / Engineer, Research Computing > Information & Technology Services > Finance & Administration > Rochester Institute of Technology > o:(585) 475-3245 | pfmeec@xxxxxxx > > CONFIDENTIALITY NOTE: The information transmitted, including attachments, > is > intended only for the person(s) or entity to which it is addressed and may > contain confidential and/or privileged material. Any review, > retransmission, > dissemination or other use of, or taking of any action in reliance upon > this > information by persons or entities other than the intended recipient is > prohibited. If you received this in error, please contact the sender and > destroy any copies of this information. > ------------------------ > > ________________________________________ > From: DHilsbos@xxxxxxxxxxxxxx <DHilsbos@xxxxxxxxxxxxxx> > Sent: Monday, January 4, 2021 3:27 PM > To: Paul Mezzanini; ceph-users@xxxxxxx > Subject: RE: Re: Compression of data in existing cephfs EC > pool > > Paul; > > I'm not familiar with rsync, but is it possible you're running into a > system issue of the copies being shallow? > > In other words, is it possible that you're ending up with a hard-link (2 > directory entries pointing to the same initial inode), instead of a deep > copy? > > I believe CephFS is implemented such that directories and their entries > are omaps, while inodes are data objects. If your operating system / > filesystem / copy mechanism isn't creating new inodes, and deleting the old > ones, they wouldn't get compressed. > > Confirmation from a Ceph dev on the above implementation assumptions would > be appreciated. > > Thank you, > > Dominic L. Hilsbos, MBA > Director - Information Technology > Perform Air International Inc. > DHilsbos@xxxxxxxxxxxxxx > www.PerformAir.com > > -----Original Message----- > From: Paul Mezzanini [mailto:pfmeec@xxxxxxx] > Sent: Monday, January 4, 2021 11:23 AM > To: Burkhard Linke; ceph-users@xxxxxxx > Subject: Re: Compression of data in existing cephfs EC pool > > That does make sense and I wish it were true however what I'm seeing > doesn't support your hypothesis. I've had several drives die and be > replaced since the go-live date and I'm actually in the home stretch on > reducing the pg_num on that pool so pretty much every PG has already been > moved several times over. > > It's also possible that my method for checking compression is flawed. > Spot checks from what I can see in an OSD stat dump and ceph df detail seem > to line up so I don't believe this is the case. > > The only time I see the counters move is when someone puts new data in via > globus or migration from a cluster job. > > I will test what you proposed though by draining an OSD and refilling it > then checking the stat dump to see what lives under compression and what > does not. > > -paul > > -- > Paul Mezzanini > Sr Systems Administrator / Engineer, Research Computing > Information & Technology Services > Finance & Administration > Rochester Institute of Technology > o:(585) 475-3245 | pfmeec@xxxxxxx > > CONFIDENTIALITY NOTE: The information transmitted, including attachments, > is > intended only for the person(s) or entity to which it is addressed and may > contain confidential and/or privileged material. Any review, > retransmission, > dissemination or other use of, or taking of any action in reliance upon > this > information by persons or entities other than the intended recipient is > prohibited. If you received this in error, please contact the sender and > destroy any copies of this information. > ------------------------ > > ________________________________________ > > > > Just my two cents: > > Compression is an OSD level operation, and the OSD involved in a PG do > no know about each others' compression settings. And they probably also > do not care, considering the OSD to be a black box. > > > I would propose to drain OSDs (one by one or host by host by setting osd > weights) to move the uncompressed data off. Reset the weights to the > former values later to move the data back, and upon writing the data it > should be compressed. > > Compression should also happen during writing the data to other osds > when it is moved an OSD, but you will end up with a mix of compressed > and uncompressed data on the same OSD. You will have to process all OSDs). > > > If this is working as expected, you do not have to touch the data on the > filesystem level at all. The operation happens solely on the underlying > storage. > > > Regards, > > Burkhard > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx