enabled --allow-discards on btrfs on top dm-crypt, now praying

Leho Kraav <leho@xxxxxxxxx> · Fri, 16 Dec 2011 02:12:48 +0200

Hi all

Thought I'd make a record of witnessing my setup starting to exhibit 
what seems to me like TRIM/gc related SSD thrashing recently.

Setup is:
 * Acer Travelmate 8172
 * Crucial RealSSD C300 64G
 * Gentoo, current kernel 3.1.1-pf
 * 6 GPT partitions
  - sda1-2 LUKS encrypted, ROOT btrfs multiple devices, 
noatime,nodiratime,compress=lzo,ssd
  - sda3-4 LUKS encrypted, HOME btrfs multiple devices, 
noatime,nodiratime,compress=lzo,ssd
  - sda5, no encryption, PUB btrfs
  - sda6, LUKS encrypted, SWAP

Multiple partitions were made at the time when I was under the 
impression it would help dm-crypt with multicore. Milan has since 
replied in the list that making multiple partitions to achieve good 
multicore performance hasn't been necessary for a number of recent 
kernel versions. It is too much effort to redo it right now, so I'm 
leaving that for a more suitable time.

I believe I started with 2.6.38 something like 7 months ago. Did not do 
a swap partition at the time, this was a recent addition by reducing 
sda5 by about 5GB. Because of a battery.ko led trigger related kernel 
BUG on resume that is still unfixed (and bko is still down), btrfs 
recently got into a bad state where writing data to certain areas of 
HOME caused kernel BUGs.

So I rsync'd /home and recreated that btrfs. Other than some other 
rare'ish btrfs BUGs with pre-3.1 kernels, which I've overcome by 
mounting with an even older kernel version i.e. off systemrescuecd, this 
setup has worked well.

But in the last some days, after uptime running into a few weeks and 
machine going through some pretty heavy desktop workloads, I noticed 
occasionally UI subsystems would grind to a halt while the HDD (well, 
SSD now) light frantically flashes for a while. Mouse, keyboard, nothing 
responds, sound plays until buffer runs out, network also eventually 
disconnects. When SSD light stops, machine returns to normal, responsive 
state.

It progressively got worse though, to where the stalls would last up to 
a minute or so. Drive light frantic flashing seemed to indicate that I 
have probably reached some kind of a garbage collection limit, but OTOH 
I could be totally wrong. Which is also why I'm posting, perhaps a smart 
person knows better. I think having had the SSD filled with these 
encrypted partitions perhaps is bothering the gc mechanism?

So I now went ahead, upgraded cryptsetup to 1.4.1 and added 
--allow-discards to my mount parameters. During the 5+ hrs post-reboots 
uptime no stalling+thrashing has happened yet, but workload also hasn't 
reached using swap yet according to free.

swapon also seems to support discard, so I enabled that, too:

       -d, --discard
              Discard  freed swap pages before they are reused, if the 
swap device supports the discard or trim operation.  This may improve 
performance on some Solid State Devices, but often it does not.  The 
/etc/fstab mount option discard may be also used to enable discard flag.

Like subject says, now some excited situation monitoring and praying is 
going on in the meanwhile. Let's see how this whole thing holds up. Hope 
you've enjoyed the story.

--
Leho Kraav, M.Sc.

http://leho.kraav.com
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt