Friday, October 09, 2009

Optimizing a php application in 5 minutes

It is often said that premature optimization is the root of all evil: it is indeed true that the optimization stage must come after the main development of an application has been completed. Optimizing means reducing loading and execution times, improving the user experience by making the application reacting more responsively.
When it comes the time to optimize your web application, don't go blindly searching for the bottleneck. Often there are no obvious improvable points in a codebase and your assumptions about the slowness causes can be wrong. Profiling is the activity of discovering what is forcing your application run slower than expected, by analyzing the code execution and time tracking.
If your language of choice is php, fortunately this process takes only 5 minutes.

Minute 1: xdebug installation
In a shell accessed directly of by ssh on the server which runs the php application, type:
sudo pecl install xdebug
Obviously if you already have xdebug on your development server you can skip this step. The pecl binary should create a minimal configuration for you, but if you not see a line referencing xdebug.so in your php.ini add it by yourself:
zend_extension=/path/to/xdebug.so
The xdebug extension provides many utilities to php developers. One of them is the profiling of code execution.

Minute 2: xdebug profiling configuration
Under the loading directive of xdebug in php.ini, add the following lines:
xdebug.profiler_enable = 1
xdebug.profiler_output_dir = /tmp
These directives tell xdebug to enable the profiler from the start of php scripts and to put cachegrind files in /tmp, after the script have finished running. The cachegrind files are lists of all the function calls made during the script execution, along with their source and information on the elapsed time. Make sure there is enough disk space on the folder you choose to kept them, since their size can quickly go up to hundred of megabytes, and to disactivate the profiling directives after you finished your optimization work.

Minute 3: load a page of your choice
I hope this does not take an entire minute, otherwise a long optimization phase will be mandatory.

Minute 4: installing webgrind
At http://code.google.com/p/webgrind/ you can download webgrind, a web application created for interpreting cachegrind files, which are not human readable. There are other solutions for reading cachegrind files, but I prefer a portable web application since where there is php, webgrind can be installed.
Simply decompress the package into a folder in your webserver and load its path it in the browser. Webgrind is written in php5 and it does not have dependencies to configure.

Minute 5: loading a cachegrind files and observing the result
Select from the webgrind menu Show 90% of [select a cachegrind file] in percent|milliseconds, and hit update. After the file has been uploaded and analyzed, a list of functions and methods similar to the following will be shown:


Every function comes with a color that distinguish it between userland functions (green), php functions (red) and constructs (gray), such as require_once(). Probably you can optimize directly only the green functions, but other tools such as the apc cache can improve the constructs as well.
The numbers crunched by webgrind comprehend the Invocation count (the times the function has been called), the Total self cost and the Total Inclusive Cost. The latter is the time elapsed between the function calls and the instant when they returned a value; the former is the effecttive time spent in the function body during the execution, excluding calls to other functions. You should find obvious optimization spots observing Total self costs, and in fact the default ordering of webgrind uses this metric.
I added a long, no-op for cycle to the constructor of the NakedService class to simulate a bottleneck like a slow query or an access to a webservice. See by yourself what happens when I profile again:

Look for places in your code that can be improved in speed not with static analysis, but enabling the profiler during the real execution.
I hope the 5 minutes have been well spent. Profiling applications is often necessary and it should not be difficult once you have this structure in place.

If you want a large sample of requests to profile only the frontend, you should use Apache Bench instead.

14 comments:

Anonymous said...

The title should say, get started with optimising in PHP in 5 minutes

Giorgio said...

Well, the emphasis is put on using the right tool instead of the micro optimizations such as "use single quotes and not double ones" you often see in the blogsphere. That said, for the vast majority of developers (who do not work at Facebook) optimization is not a deep topic: just find the bottleneck and cache everything you see there.

Rob Hofmeyr said...

I agree with Anonymous, "Optimizing a php app..." was a little bit of a deceiving topic. Maybe if the article was about enabling APC opcode caching it would be better suited.

Anyway, I found one glaring optimization that could be made to your code... GET RID OF THE ZEND FRAMEWORK! Such a bloat. Run pecl/inclued on one of those pages and you'll see what I mean. The number of classes that need to be loaded just for a 'Hello World' app is frightening. Oh, and they are all called by require_once. Dodgy.

I stumbled upon a great framework the other day. Web2BB (http://chopwoodfetchwater.com/). Requires PHP 5.3, but I made it PHP 5.x compatible and removed the (already minimal) bloat. Now, just left with a simple MVC routing framework. SUPER lightweight and fast. Try port your application to that, run a cachegrind and prepare to be amazed by the performance difference...

Pete Warden said...

Have you tried XHProf? It's a Facebook project that provides a lightweight profiler with an integrated web front end.

http://developers.facebook.com/xhprof/

It's definitely not as fully-featured as XDebug, but I've found I use it a lot more often as it's quicker to run and see results than a *grind based solution.

Pat said...

I ran into problems with "parsers is undefined" on a first load. It is from line 482 of the jquery.t...sorter.js file

Giorgio said...

Rob, the point of using a framework it's not producing a hello world app, but producing a big app where the burden of a framework will scale much better than using native code.
Pete, thanks for the link. Having alternatives is good.
Sven, this is an error from the sorting feature that is applied to the table. It's strange that it happens from a fresh installation. Some strange browser? If it's a bug you should signal it to the webgrind tracker: http://code.google.com/p/webgrind/issues/list

Benjamin A. Shelton said...

Very nice article, Giorgio. This is helpful information for PHP developers!

I wanted to comment in reply to other posters: Your title is fine. In fact, it's perfect. Changing it as anonymous suggested would be far too wordy; frankly, if that's what the title were, I would've skipped over your post entirely!

For those who disagree, I appeal to Jakob Nielsen's article: http://www.useit.com/alertbox/nanocontent.html

It details why short, succinct titles are better! (It did get your attention, did it not?) :)

Again, thanks for the information Giorgio!

Giorgio said...

Thank you for your feedback Benjamin. I'm glad php developers appreciate the work I'm doing here in this blog.

Rob Hofmeyr said...

@Giorgio

I never said it was. This is a post about optimization. I was pointing out how much overhead you add by using the Zend Framework.

How would using a bloated framework help you scale in a big application? The Zend Framework is full of useless wrapper and abstraction classses (Zend_Session, Zend_Db spring to mind), which add extra overhead and require_once's to your codebase. If I were optimizing a large app, I'd use as much of PHP's native functionality as possible and look at implementing other CPU and resource intensive operations in C/C++. The Zend Lucene module for example, is implemented entirely in PHP and as a result scales horribly and performs terribly when compared with Lucene or clucene (for which there's a pecl extension).

The point is this, if you're looking at building performant apps that will scale, you shouldn't be implementing someone elses framework solution that has wrapper classes for native PHP functions and requires ten classes (require_once) just to bootstrap an application. PHP should look like PHP, and SQL should look like SQL. Don't use an ORM. You need to know what your SQL is doing. You need to be able to stick an EXPLAIN in front of a statement or move from in-line SQL to stored procedures. Avoid using expensive functions and classes written in PHP (Zend PDF, Zend Lucene). Look for pecl extensions or try and roll your own - they aren't that difficult once you get the hang of them.

Rob Hofmeyr said...

Sorry, this isn't meant to come accross as an attack. Just want to point out the performance issues inherent in the Zend Framework. Otherwise, good article. Thanks.

Unknown said...

Rob,

It's always funny to read posts from performance-geeks. Go ahead and optimize away all the "bloat" and then enjoy the awesome performance that ... you dont even needed to begin with. Great job!

Why should someone who uses ZF or whatever and who has a perfectly fine running application strip out "bloat"? Waste of time. If it works and is fast enough there is nothing to do (unless you're a performance-fanatic and enjoy wasting your time of course).

About your comment about ORM. Apparently you think ORM is an additional thing you can easily live without under all circumstances. Wrong. If you have a domain model and a relational database, you need some ORM, the question is just whether to handcode it or use something thats there. If you dont use a domain model you dont need an ORM like the ones used in mainstream. Now if you want to argue that in order to improve performance one should not use a domain model, go ahead, but thats a different dicussion.

And apparently there is another misunderstanding. The goal of most ORMs is not to "hide SQL" (maybe some do but not the ones I know). Without good SQL knowledge you should really not use an ORM and SQL logging is probably the most important thing to do, whether you use an ORM or not. Their goal/task is easy relational persistence for your domain model. Point.

I know this is old but apparently it still did not reach that many php developers: It is possible to deploy an application to multiple servers, I swear! It works and costs per year as much as I (or any other sane developer) cost per day! Not a bad deal!

Most people waste too much time on worrying about performance. The world does not only consist of applications with the scale of Facebook and Wikipedia.

The approach of Giorgio is absolutely fine. Use whatever tools help you develop efficiently and solve performance problems only WHEN and IF they appear, otherwise there's no need to. Constraining yourself to inferior tools, slowing down your development, out of fear that there "might" be performance problems is just, sorry, stupid.

Rob Hofmeyr said...

@romanb

First off, was the whole focus of this article not on performance and optimization? If you have a problem with PHP optimization advice, take it up with the poster. Not me.

Don't optimize if you don't have to - sure, I agree. That also doesn't mean you should build an application ignoring all and any performance best practices. Of course you don't have to build your application to handle Wikipedia/Facebook loads. I never said you did. But at the same time, scaling "only when I need to" doesn't work. Not without dire consequence. Keeping in mind a few best practices during development - especially when you're expecting high loads in the future - makes all the difference when that time comes.

Throwing more hardware at a problem, doesn't always solve it, and certainly isn't always cheaper. I'm not sure what you mean by this. How much do you charge per day, or where are you hosting?! Don;t forget that once an optimization has been made it's been made. It's not an annual expense. What about the cost and technical expertise associated with managing your server farm/data centre?

Fair enough on the ORM statement. Although I still wouldn't use the like of Zend DB/Doctrine etc. ORM's. These do abstract away from SQL which makes optimizing DB calls far more difficult later on.

"Without good SQL knowledge you should really not use an ORM and SQL". The fact is, that with framework development being "in vogue" at the moment, many people do - and many people don't understand SQL as well as they should.

"Constraining yourself to inferior tools, slowing down your development, out of fear that there "might" be performance problems is just, sorry, stupid."

In what way, have I slowed down my development? Instead of having to learn a new framework, it's semantics and all of it's wrappers for native PHP functionality ($this->escape instead of htmlentities is one my favorites) I'm using native PHP functionalty. Which as a PHP developer is something I know.

And how is a a "lower level" form of the language inferior? Are you suggesting the higher the level of abstraction the better?

Robdog out.

Giorgio said...

I wrapped up some thoughts on this discussion in a new post, since they will be out of scope here when the focus was originally on a profiling technique for php.
http://giorgiosironi.blogspot.com/2009/10/value-of-abstractions.html

Custom Facebook Applications said...

Really impressive site to me, i am impressed a lot by this awesome and also interesting site.

ShareThis