> I thought journal write-back mode should use large ssd > space,like bcache which will prevent random write at all cost. The write journal is supposed to buffer a few stripes to avoid the write hole. Consider the case of a 2-drive write journal arrangement: you would be effectively adding a RAID1 component to your RAID5 set for recently updated data. Then why use RAID5? Also consider the size of journals for filesystem types that have it: typically it is 32MiB-128MiB. > but reading the document again, it said "The flush conditions > could be free in-kernel memory cache space is low". That's another issue with the Linux default for the buffer system, it usually buffers too much if there is no 'sync'. > since the memory won't be too large compare to normal ssd > disk, I am not sure I understand why that is relevant, what happens there depends on 'sync' behaviour and the filesystem and buffer cache flushing interval if any. > maybe a small optane ssd is best for mdadm write-journal. The reasoning before this I don't quite understand, but Optane is a very good choice for a persistent write buffer, as it is not volatile and has much faster and smaller writes than flash chips.