Hi, On 05/07/15 12:30, Burkhard Linke wrote: > [...] > Part of the OSD boot up process is also the handling of existing > snapshots and journal replay. I've also had several btrfs based OSDs > that took up to 20-30 minutes to start, especially after a crash. > During journal replay the OSD daemon creates a number of new snapshot > for its operations (newly created snap_XYZ directories that vanish > after a short time). This snapshotting probably also adds overhead to > the OSD startup time. > I have disabled snapshots in my setup now, since the stock ubuntu > trusty kernel had some stability problems with btrfs. > > I also had to establish cron jobs for rebalancing the btrfs > partitions. It compacts the extents and may reduce the total amount of > space taken. I'm not sure what you mean by "compacting" extents. I'm sure balance doesn't defragment or compress files. It moves extents and before 3.14 according to the Btrfs wiki it was used to reclaim allocated but unused space. This shouldn't affect performance and with modern kernels may not be needed to reclaim unused space anymore. > Unfortunately this procedure is not a default in most distribution (it > definitely should be!). The problems associated with unbalanced > extents should have been solved in kernel 3.18, but I didn't had the > time to check it yet. I don't have any btrfs filesystem running on 3.17 or earlier version anymore (with a notable exception, see below) so I can't comment. I have old btrfs filesystems that were created on 3.14 and are now on 3.18.x or 3.19.x (by the way avoid 3.18.9 to 3.19.4 if you can have any sort of power failure, there's a possibility of a mount deadlock which requires btrfs-zero-log to solve...). btrfs fi usage doesn't show anything suspicious on these old fs. I have a Jolla Phone which comes with a btrfs filesystem and uses an old heavily patched 3.4 kernel. It didn't have any problem yet but I don't stuff it with data (I've seen discussions about triggering a balance before a SailfishOS upgrade). I assume that you shouldn't have any problem with filesystems that aren't heavily used which should be the case with Ceph OSD (for example our current alert level is at 75% space usage). > > As a side note: I had several OSD with dangling snapshots (more than > the two usually handled by the OSD). They are probably due to crashed > OSD daemons. You have to remove the manually, otherwise they start to > consume disk space. Thanks a lot, I didn't think it could happen. I'll configure an alert for this case. Best regards, Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com