Mike Willbanks recently wrote a good article about performance tuning. There’s some good advice in there, and I thought it’d be a good idea to quickly add a bit more detail about the separate approaches that Mike raises.
Mike recommended using APC for bytecode caching. APC’s pretty good, but just be aware that APC isn’t compatible with Sara’s excellent runkit extension. Xcache is, but some versions of Zend Optimizer refuse to run if they detect Xcache has been loaded. (Btw, Zend Optimizer is worth looking at, but because of the way Zend compile it, it can affect overall scalability. I haven’t sat down yet and worked out whether Zend Optimizer’s performance improvements make up for the cost of how the Linux kernel has to load it into Apache. I touched on the issues with how things are compiled last year, but haven’t followed it up yet with any definitive figures on scalability.)
If you don’t need Zend Platform’s download server (which rocks), then XCache + Zend Optimizer + Memcache out-performs Zend Platform substantially, and costs a lot less too ;) Zend Platform also isn’t compatible with runkit. It’d be great to see runkit supported better by accelerators.
Memcache is best suited to storing smaller pieces of data. If you’re using it to cache whole XHTML pages, they sometimes don’t fit into Memcache, and need to be cached on disk instead. (Always cache onto local disk, never NFS). Memcache divides the memory allocated to it into different size buckets for performance reasons, and there are far more small buckets than there are large buckets. You can edit the Memcache source code and change the size of the largest bucket before recompiling.
The GZIP trick Mike mentions just isn’t safe with IE6. There are copies of IE out there that fail to decrypt the content correctly alas :( I remember reading a stat that it was about 1% of copies of IE had this bug, but I don’t have the link to hand. I have seen copies of IE with this bug myself. There’s nothing more frustrating than looking at two copies of IE, both reporting exact version numbers, and one copes with GZIPed data whilst the other one doesn’t :( It’s possible that the widespread adoption of IE7 has “fixed” a lot of these buggy IE copies.
I’d recommend placing more emphasis on the Not Modified header, and also on making sure that your code is architected to send back Not Modified headers as quickly as possible. It not only improves per-page performance, but reduces per-page memory usage, and substantially improves scalability. Getting this right can make a huge difference, especially for sites where users normally view more than one page per visit. And make sure the metadata you use to work out whether or not you can send back the Not Modified header is fine-grained enough :)
Also, looking at the Not Modified header … don’t take it for granted that Apache is getting this right for your static files. I can’t remember which Apache module disables this off the top of my head (I think it was mod_includes, but I could be wrong), but check the HTTP traffic to make sure your site isn’t sending static files when it doesn’t need to.
With SQL queries of the form “SELECT … FROM table WHERE primaryKey IN ( … )”, be aware that the size of the IN list varies from database server to database server, and it doesn’t take all that big a list before you run into portability problems.
One important thing Mike didn’t touch on was about separating out static files onto a separate box. Apache + mod_php doesn’t serve static files very efficiently. With static files on a separate box, you can recompile Apache to use the “worker” MPM, which serves static files substantially better, or you can use an alternative web server such as lighttpd.
There are plenty of other things you can do to optimise PHP on servers, such as tuning Apache to prevent swapping, tuning the Linux TCP/IP stack to reduce connection failures at peak times, and moving your database off onto a separate box. I’m going to go into these in a lot more detail at a later date.
Finally, xdebug is a fantastic tool for profiling your code and telling you where you have inefficient loops and whatnot. It takes the guesswork out of finding bottlenecks!