Recently I switched my media server from ext4 on LVM to ZFS. This has mostly been a great success, however I noticed a big problem with MythTV. In the past it would take 1-2sec to start playing video, now it would take 5-60sec. Ouch!
In this post I’ll describe the trouble-shooting techniques I applied, along with all the various things I found and fixed along the way. If you just want to see the solution you can scroll to the end.
Measuring the problem
The MythTV frontend writes lots of interesting info to syslog. For example it logs “Creating TV object” when you ask it to start playing something, then it logs “Entering main playback loop” when the playback has actually started. I wrote a simple awk script, show-mythtv-delays, which measures the delay between these two events and prints it to the screen. This gives me output like:
Mar 27 17:21:19 4 Mar 27 18:05:27 59
This was important because I’m not the only one using MythTV, this way I could accurately measure the problem.
I just changed out the filesystem, so my first thought is to blame the disks and/or ZFS. With RAIDZ the data is striped across all disks in the pool, so reads that previously would have only hit one disk now are hitting all of them. I remember having issues with reading from multiple disks back in the PATA days and wondered if the same thing could still happen?
So I fired up atop and started looking at IO throughput and IO wait times. I could see the disks were performing badly, but no specific process was being called out as problematic.
I did a simple test of writing a few GiBs of data, flushing the caches and reading it back. A single disk with ext4 would give me ~100MiB/sec with this test, but ZFS couldn’t match that with three disks! I’m using a GA-X58A-UD3R motherboard, which has many different SATA ports run by different SATA controller chipsets. Originally I’d plugged two disks into the 6Gbps ports, because faster is better right? After a bit of reading online it seems the Marvell 9128 which runs those ports is terrible. Moving everything to the Intel SATA ports showed a significant improvement in both IO wait and throughput.
Unfortunately MythTV seems only a little better. It’s still taking way too long to start playing video.
ZFS is a copy on write file system. In order to make up for the inherent fragmentation this causes it has some clever caching algorithms, namely the ARC. In short ZFS needs a bit more memory in order to give equivalent performance to a more conventional file system. I had noticed this in the past when switching to ZFS on my laptop, at that time I upgraded from 4GiB to 6GiB of RAM.
So for this troubled media server I upgraded from 4GiB to 8GiB. Sure enough the ARC gobbled up the extra memory and I could see improvements to cache hit rates.
This did seem to correlate with an improvement to MythTV. It was no longer slow almost every time a video started playing, however it would still frequently take much too long to start.
FreeBSD and illumos/Solaris have a great tool called dtrace. With dtrace you can obtain detailed information about what your system is doing, how long it’s taking and what triggered those actions. You can trace all the way from functions in the kernel back through to userspace apps like MySQL. This magic can be enabled at any time without interruption to your system.
I tried using the newfangled Linux eBPF probes, which are sort of a wannabee dtrace. The bcc package has some useful example eBPF scripts. Unfortunately many of them didn’t work for me. One of those that did work was zfsslower, which indicated to me that the disk IO was being caused by MySQL. That was very helpful!
There’s an example eBPF script for diving further into MySQL, however MySQL for Ubuntu is not compiled with the USDT/dtrace hooks. I didn’t feel like recompiling and I wasn’t about to learn the C-like eBPF language to write my own, so I didn’t go any further here.
Hopefully that in the near-ish future Linux will have a tracing story that can match Solaris/dtrace from a decade ago!
MySQL slow query log
Now that I knew the problem was with MySQL I can turn up the logging. There’s an option called slow_query_log to log any queries which take longer than 10 seconds.
After enabling this option I saw that playing a new video did a very slow query on the mythconverg.recordedseek table.
A couple of weeks had passed and this problem made using MythTV quite frustrating. As a quick-fix I put an additional physical disk into the machine and created an ext4 filesystem on top of it. After moving the MySQL databases back onto ext4 I was pleased to see that videos consistently started playing within 1-2sec again. The difference was very dramatic!
However this could only be a temporary measure. I wanted to stick with ZFS for raidz redundancy and other features like snapshots. Also I wasn’t going to dedicate a whole disk to ~200MiB of MySQL databases. At least this way I could move the to ZFS for testing, then back to ext4 when I was done. That left the system in a usable state.
Rebooting with nopti gave a 50% reduction in video start time. However it was still way too slow.
I thought maybe switching to MariaDB would help. To do this I did a full dump and restore of all tables, which would guaranteed that the problematic recordedseek was freshly created. It had no effect.
Newer versions of ZFS often have performance improvements. However upgrading to ZFS 0.7.5 didn’t show any change.
The real fix!
Finally I just decided that MyISAM and ZFS don’t mix. I converted the table to use the Aria storage engine.
It now works perfectly! We’re again able to watch MythTV with fast video startup, and I also have the warm fuzzies from ZFS snapshots, checksums and raidz :)