Yes, another programming blog…get over it. For you non-techies, go get some candy or something, let the grown-ups talk. BTW, TinaT! (This is not a Tutorial!) This is just a monologue of what I went through to get a program to fork.
Ok, so over the holidays I spent some time deep into code. I wrote apacheLogSplit a few weeks ago so I could import a customer’s log files into MySQL so I could root around in them. (Hey, I like rooting around in data!) Anyhow, program worked like a charm but it was a bit slow. So after I released it, I decided to see if I could speed it up. The first thing that came to mind was “boy it would be nice if this were a multi-threaded application.” Unfortunately, PHP is not a multi-threaded environment. The best we can do is fork it. I like working on new concepts with existing projects because I’ve already worked out the details of the logic so I can concentrate on the new concepts I’m learning. So like a fat-man doing a swan-dive, I took the plunge with apacheLogSplit.
Having never worked with forking before, I did what any lazy programmer does, I googled around to see what others had done that I could steal. As luck would have it, there’s a class in pear called PHP_Fork. It’s specifically designed to encapsulate the forking process and make it easier to work with in an OOP environment. (and if you are not writing OOP then just hang your head in shame and walk away slowly.) since apacheLogSplit was built using OO, it seemed the perfect solution. That is until I tried to install it. PHP_Fork requires PHP be compiled with –enable-cli –with-pcntl –enable-shmop. Even after recompiling it to meet the specs (I didn’t have shmop compiled originally) it still wouldn’t *&^%!!! install using pear. Well, when the tools fail you, reach for the sledgehammer; I resorted to downloading it and installing it manually. Once I placed it manually in the proper pear directory I was on my way.
I knew my program was overly complex. Version 1.0 was a learning experience for a different lesson. I was experimenting with PHP5′s new OOP features. While I got everything to work the way I wanted to, if you’ve looked at the code you know there are a bunch of abstract classes, even an interface that gets implemented. So before I could do much with it, I had to simplify it. (Much like I need to do with this blog.) I got rid of all the abstract classes and such and pared it down to 3 main classes (the main controller class, the line class and the SQL writer class) and a few lines of procedural code in the main program to handle parameters and such. (Yes, I KNOW I could have moved that into an object!)
So, we now have code we can work with and our Fork.php installed. We are now ready to get down to some serious forking around. The first thing I learned is that I had a mis-conception of how forking worked. I knew that it was not the same as a thread but I assumed that when you forked code that the child started again form the beginning. Not so, the child process begins running with the statement after the fork. Very important distinction and it’s probably a mis-conception that only I had.
Once I crossed that hurdle, I was able to make quick progress. I had already designed my code so that the main body of processing took place inside an object; I made this object my thread marshaler. It read in a line from the file and then handed it to the first idle thread.
My threads have 2 subordinate objects, a line parser and a SQL writer. As I said earlier, for simplicity’s sake I did away with all the abstract classes etc. This version of the program can not be easily extended to write to XML, etc. I did that so as to simplify the code for readability and because I question whether anyone will actually take the time to do it. Once simplified, the object interaction is simple and easy to follow.
This brings me to the final hurdle I had to overcome, parent-child communication. This is where I REALLY miss threads. Threads make this a might easier because you can directly fire methods on the child form the parent. And because they are threads, they will sit in a sleep state until you do fire a method on them. In this case, each child sits in a loop waiting for a message. Once it receives a message, it processes the line and then goes back into its loop.
The only way to send the child a message is to use shared memory. PHP_Fork handles this nicely by setting up the semaphores and defining the 2 methods setVariable() and getVariable(). Using them you can set a variable in the parent and retrieve it in the child. So in my loop I set the variable lineValue. The child loops until it finds a value for lineValue and then goes to work processing it. Easy as pie, interprocess communication. Note that it is possible but a bit more complicated to fire a method on the child thread with a parameter. I chose not to bother with it in this case but I could have just as easily done that.
That’s it. My little PHP program is now a multi-process program. Here’s the kicker. It didn’t make any difference in speed. The main problem I have is that I’m running the program on the same box as MySQL. Thus they are competing for processor resources. 2 threads ran about the same speed 10 threads actually ran slower. I think the results would have been different if I were running them on separate boxes on a LAN.
Until next time, come back home safely.