By Stof in Node.js — Jun 4, 2016

Misconception on CPU: Node.js vs PHP blocking web requests

We know the Node.js engine runs our code asynchronously, and we know it does that using an event loop. When Node.js is used to handle web requests it is capable of managing them over each CPU available via its cluster functionality achieving incredible performance over rival server-side languages.

PHP on the other hand by nature is parallel not asynchronous, nor is it capable of achieving multi-threading. We all know this, but were you aware how your web requests are really managed? The Zend engine that is PHP is thread safe, what that means is the engine is capable of processing simultaneously many threads at once without any 2 threads colliding on resources. So, regardless if you are using Apache or Nginx or any other web server, that web server actually spawns a new PHP managed thread for every web request received.

The difference between PHP and Node.js is PHP's engine can be using resources from many threads at once, but all web requests that Node.js handle are managed on only 1 thread per available CPU.

What this means to you is as a Node.js or PHP developer all you need to worry about on 1 PHP script in terms of CPU is that your script doesn't effectively utilise 100% of the CPU if you care about not interrupting execution of other threads on the same server including those that are managed for incoming web requests.

As a Node.js developer you have 1 additional concern that a PHP developer never will.

Blocking the event loop, blocks CPU operations and interrupts execution for incoming web requests.

Why is not a PHP issue?

Different requests from different clients are ran in different instances of PHP invoked by the web server, and those instances share nothing in memory or resources. Collisions could only happen on persisted data such as files or databases. Most databases handle such collisions very well, and with files you have to be more careful.

Node.js gotcha - Blocking the event loop

JavaScript in Node.js (just like in the browser) provides a single threaded environment. This means that no two parts of your application run in parallel; instead, concurrency is achieved through the handling of I/O bound operations asynchronously. For example, a request from Node.js to the database engine to fetch some document is what allows Node.js to focus on some other part of the application:

Trying to fetch an user object from the database. Node.js is free to run other parts of the code from the moment this function is invoked;

db.User.get(userId, function(err, user) {
	// .. until the moment the user object has been retrieved here
})

This is what draws many developers to Node.js, it is powerful and performant.

However, a piece of CPU-bound code (synchronous code) in a Node.js instance with thousands of clients connected is all it takes to block the event loop, making all the clients wait. CPU-bound codes include attempting to sort a large array, running an extremely long loop, and so on. For example:

function sortUsersByCountry(users) {
	users.sort(function(a, b) {
		return a.country < b.country ? -1 : 1
	})
}

Invoking this sortUsersByCountry function may be fine if run on a small users array, but with a large array, it will have a horrible impact on the overall performance. If this is something that absolutely must be done, and you are certain that there will be nothing else waiting on the event loop (for example, if this was part of a command-line tool that you are building with Node.js, and it wouldn’t matter if the entire thing ran synchronously), then this may not be an issue. However, in a Node.js web-server instance trying to serve thousands of users at a time, such a pattern can prove fatal.

If this array of users was being retrieved from the database, the ideal solution in Node.js would be to fetch it already sorted directly from the database. If the event loop was being blocked by a loop written to compute the sum of a long history of financial transaction data, it could be deferred to some external worker/queue setup to avoid hogging the event loop.
We all know that the bottleneck in all web based environments is the database, and for a PHP developer we commonly avoid sorting, grouping, and Scalar functions in SQL databases for the symple pure fact the server side language is better and faster at that sort of thing then a SQL engine is, and we want our database to have the least impact as is possible.

Conclusion

As you can see, there is no silver-bullet solution to this kind of Node.js problem, rather each case needs to be addressed individually. The fundamental idea is to not do CPU intensive work within the front facing Node.js instances - the ones clients connect to concurrently.

The real challenge is trusting that you, and your developers if you are responsible for any are keenly aware of what "CPU intensive work" is in all cases.

I hope this article helps you and saves you time - please spread the knowledge!

Tweet to @chrisdlangton

Conclusion

Subscribe to Christopher D. Langton