I've got an interesting problem with Linux that's proven tricky to track down. Basically, it's a web site running on Apache on half-a-dozen load-balanced servers. All of them talk to a MySQL database on another server. All of them are connected together via a private gigabit ethernet network, as well as their public interfaces.
Now, for various reasons, the web pages open new connections to the database at the start of the pages. (Pooled connections are a political problem, so we don't use them.) We have a few predictable times where this fails, and so I've been digging into the TCP/IP settings and the thread and open file settings in /proc.
First, TCP. The default TCP buffer maximum (sys/net/core/rmem_max and wmem_max) is 128k. In some TCP tuning guide I found, one of the first recommendations was to raise that to 16Mb (!). Not sure of the utility of such a massive increase, I raised ours to just 1Mb, but I don't know how to see if there's any benefit. In fact, nothing obvious has improved. Am I on the right track?
Second, threads. We've got MySQL setting it's own open files limit to 10240 (I was a little surprised this can be done) and it's maximum connection count is 1500. Yet it always peaks at something like 995 threads. The maximum thread count in /proc is something over 14000 and the maximum files open is almost 400000. Upping the open files limit is easy, but unfortunately requires a restart of the database service - although it's just a shutdown and a restart, it takes long enough to be noticed. Is increasing the open files limit the first thing I should try?
Thanks!
Wade.