2004-3-16
First post of the new year, and what a year
Been quite a while since the last post. Well, I've been busy. Among other things I've been working on a Spider for my new company. After a rather long process, I've landed on a threaded controller/producer design. Threads with python can be a hassle, but it's definitively easier than async operations.
The Spider implements a threaded XMLRPC server in addition to the controller/producer threads. Running with 15 threads I've managed to make 13.970 HTTP request pr. hour. The operation is quite simple. The XMLRPC server receives an URL which is added to a queue. The queue is read by an controller who pop's the item and inserts it into the producer queue. The controller then updates, via XMLRPC, the caller with status 0 if all threads are busy or 1 if there are threads available. The producer queue is read by the thread pool. Each thread downloads the URL and feeds it through a HTML parser. The parser dissolves frames and collects .js and .ccs links. The resulting data is then added to a mysql database.
As mentioned, 13.970 unique URL's were processed on a single CPU AMD XP1800+ running Fedora Core 1 with Python2.3.3. I think it's quite impressive. Of course, what delights me the most is that my Spider replaced a Java based Spider. The Python Spider, being amazingly faster, using a heck of a lot less CPU, not to mention memory, simply squished the competition.
Now, my only concern is scalability. Hardware is cheap, but dual processors is even cheaper. With the GIL, dual may prove difficult. I guess I could spawn a child Spider when all 15 threads are occupied. It may, or may not, use the second processor. In theory it should, but I've got no way of knowing until I can have a test run on a dual machine.
In other news - Arsenal meets ManUnited in the FA cup semi. Arsenal tops the Premier League 9 points ahead of Chelsea (thank you ManCity!!). Arsenal is also well on their way to meet Real Madrid in the Champions League semi. Now there's an interesting match...
Comment on this post [ so far] ... more like this: [as usual, python]
2003-11-27
reST, mostly
One week into the blogging project and I'm already falling behind. The story of my life I'm afraid. Today, I finally got around to update the interface of my "ifoqus personal client" applicaton on MacOSX. Had it running on Linux and Windows beforehand, but I only recently managed to build a resonably bug free PyQt on OSX. Looks beautiful though. Trolltech knows what they are doing.
My back hurts. Nothing new there. Since my back is hurting so much, it's hard to concentrate on real work TM. So I get time to spend updating the blog. Isn't that nice?
Only yesterday I completed a project writing a techinical document for *stuff* at work. I had meant to look in to reStructuredText for some time, and this gave me an opportunity to do so. What can I say? I'm impressed! Downloaded the latest Docutils version from sourceforge and browsed the documentation for five to ten minutes. Sure, I made some mistakes early on with indents and new-lines and such, but the result is smashing! Hacked up a .css with the company logo and ran html.py on the whole shebang. Out came a .html document ready for publishing. Same thing with XML, a bit on the verbose side but the document validates perfectly. One nice feature of MacOSX is the print-to-pdf option in every print dialog. By a lucky stroke of fate the document I had written was a perfect fit for the print-to-pdf feature. I've created pdf's of other web pages where I was not so lucky.
As happy as I am about Docutils, it seems other people are not as happy. While discovering more about Docutils and reST in general I stubled upon this discussion. Reading through it, I came across this post. Now, I've got nothing but respect for Mr. Lundh, beeing of the brother people and all, but boy did he get up on the wrong side of the bed that morning. I won't dvelve to much on this subject other than to say *duh!*.
Anyways, in other news A R S E N A L beat the crap out of Inter Milan, and all the people rejoiced. Suddenly Arsenal is a contender for the Champions League title..oh well, when pigs fly and so on..
My back hurts. Nothing new there. Since my back is hurting so much, it's hard to concentrate on real work TM. So I get time to spend updating the blog. Isn't that nice?
Only yesterday I completed a project writing a techinical document for *stuff* at work. I had meant to look in to reStructuredText for some time, and this gave me an opportunity to do so. What can I say? I'm impressed! Downloaded the latest Docutils version from sourceforge and browsed the documentation for five to ten minutes. Sure, I made some mistakes early on with indents and new-lines and such, but the result is smashing! Hacked up a .css with the company logo and ran html.py on the whole shebang. Out came a .html document ready for publishing. Same thing with XML, a bit on the verbose side but the document validates perfectly. One nice feature of MacOSX is the print-to-pdf option in every print dialog. By a lucky stroke of fate the document I had written was a perfect fit for the print-to-pdf feature. I've created pdf's of other web pages where I was not so lucky.
As happy as I am about Docutils, it seems other people are not as happy. While discovering more about Docutils and reST in general I stubled upon this discussion. Reading through it, I came across this post. Now, I've got nothing but respect for Mr. Lundh, beeing of the brother people and all, but boy did he get up on the wrong side of the bed that morning. I won't dvelve to much on this subject other than to say *duh!*.
Anyways, in other news A R S E N A L beat the crap out of Inter Milan, and all the people rejoiced. Suddenly Arsenal is a contender for the Champions League title..oh well, when pigs fly and so on..
Comment on this post [ so far]
2003-11-20
Vi taper aldri!(sing-along-song)
Last night our national team was given a bashing by the Spanish side. Yes I'm talking about football, or as the americans will have it - soccer. I'm an avid football fan. Arsenal is my team, and I'm proud of it! I'm not too proud of my national football team though. However, we got what we deserved. We got beaten by a much better team in two play-off matches. There's no doubth the better team deserves a ticket to Portugal.
...Strange how the international sports websites has chosen to ignore the match...Oh well, cowardly behaviour shouldn't get head-lines anyway.
Good luck Spain!
...Strange how the international sports websites has chosen to ignore the match...Oh well, cowardly behaviour shouldn't get head-lines anyway.
Good luck Spain!
Comment on this post [ so far] ... more like this: [Football]
2003-11-19
Signals and Python
It's not every day I feel the need to have signal handling in my python scripts, but I've found them useful in dealing with servers. Especially the kind of servers that serves forever..
The following code snipplet demonstrates the use of signals to gracefully exit a server that has been daemonized or simply put in the background.
The following code snipplet demonstrates the use of signals to gracefully exit a server that has been daemonized or simply put in the background.
import signal def endHandler(signum,stackframe): raise KeyboardInterrupt def main(): try: server = Server('',8000) server.register_introspection_functions() server.register_instance(XMLRPCRegisters(conf_obj)) try: signal.signal(signal.SIGTERM, endHandler) while 1: server.handle_request() except KeyboardInterrupt: pass finally: try: server.socket.close() except UnboundLocalError: """Since variables never got initialized - Die gracefully""" pass
(If you are using IE5.5/6.0 to view this page, the above text will not display correctly. Ask Bill to conform to the W3 standard.)
Now you can ask the server to terminate by issuing kill -15 $pid. This signal handler wil not catch kill -9, either way - should you have to use that chances are there's something else wrong with your server.
Was this helpfull or am I just ramblin'?
Back to work btw.
Was this helpfull or am I just ramblin'?
Back to work btw.
Comment on this post [ so far] ... more like this: [Pyhton signals]
2003-11-18
Confusion
Ok, this is what publishing is all about. Even I, with my meager knowledge of bloging, can manage this. And to think I've spent the best part of two years in php hell programming a publishing system. Oh well, those days are gone and soon forgotten.
When I start out on something like this, I always have high hopes and a good deal of confidence in my self. Then I let it slide into oblivion. It's not like it's work related or anything you know. If it was work related then it would be no problem. This I do for fun, and the fun has a tendency to sour. Why? Because I've got a busy mind, there's always something else to occupy my time.
On the bright side, python is here to stay. I cleared up the attic on the left hand side of my head and it fits snuggly. So maybe, just maybe, this time I'll follow through....
Anyways, for those of you who's still with me, I'm truly sorry to have wasted your time.
Comment on this post [ so far]