Run TRIM or freeze
I manage a home server that is used by multiple people mostly as a fileserver and hosting for personal projects. It runs on an Ivy Bridge-era Xeon and Supermicro X9SRi-F motherboard. It’s plenty powerful for our needs, but ever since I upgraded to this setup from the previous chinese “X79” motherboard, the server started freezing randomly. First it was maybe once a month, which a watchdog took care of, but lately it grew to freezing multiple times a day. The motherboard had a faulty BIOS flash chip, but that turned out to be a separate problem (an easily remedied one, thankfully). I ran CPU and memory stress tests, but they didn’t induce the freezes with any regularity.
The server was becoming unusable, so I got a new motherboard and CPU, found a great deal on 4x16GB DDR4 RDIMMs, and replaced most of the server. Only the disks, PSU and case remained, so I thought the freezing was surely gone.
A day after the transplant, the new system froze, partially. Some services kept running, while others died or hung on connection. Thankfully, this time it managed to save some logs. Looking through them, I saw many reports of processes not stopping after being killed and a failure to unmount
/boot/efi. This finally gave me a hint to the real problem.
This is speculation, but I believe the freezes were caused by hung IO to the root SSD. I didn’t notice whichever Debian version I first installed didn’t enable
fstrim.timer, so the root SSD was never TRIMmed. I think the SSD ran out of free blocks, possibly it could no longer overwrite a used one, and instead of throwing an error, it dropped the request. Now that I’ve enabled periodic TRIM, I haven’t seen any freezes in more than a week.
Short morale of the story: TRIM your SSDs, they break in strange ways without it.