pycs-devel archive weblog

2003-3-13

Phillip Pearson: [PyCS-devel] did I mention?

http://www.pycs.net/allyourrss.html



a little mini-aggregator for pycs.net.



In CVS now, as /rss ... do what you will.  If anyone feels like

hacking it to use a template (Cheetah or something) to generate the

output, that would be cool.  Adding it to the Makefile so it's

installed with PyCS, and getting it to use pycs_paths.py would be

handy too ;)



Cheers,

Phil

Phillip Pearson: [PyCS-devel] forking inside a module script

Hi,



I've decided that the safest way to run htsearch is to fork inside the

module script, run htsearch in the child process, and let the OS clean

up after it.



(how to do this in Python, for ppl who're interested:

http://www.myelin.co.nz/post/2003/3/13/#200303135)



However, I found that the module handler catches SystemExit

exceptions, meaning that the child processes weren't being allowed to

exit properly.  I've changed pycs_module_handler to just re-raise if

it gets a SystemExit, but it looks like Medusa also catches it.

Here's a quick diff to get Medusa to re-raise too:



RCS file: /cvsroot/oedipus/medusa/http_server.py,v

retrieving revision 1.10

diff -u -r1.10 http_server.py

--- http_server.py      18 Dec 2002 14:55:44 -0000      1.10

+++ http_server.py      13 Mar 2003 09:58:20 -0000

@@ -495,6 +495,8 @@

                         # This isn't used anywhere.

                         # r.handler = h # CYCLE

                         h.handle_request (r)

+                    except SystemExit:

+                        raise

                     except:

                         self.server.exceptions.increment()

                         (file, fun, line), t, v, tbinfo = asyncore.compact_traceback()



I guess we should push this one over to the Medusa people too ...



(I haven't put any of the search stuff into CVS yet BTW, but will

soonish hopefully).



Cheers,

Phil

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!



> Congratulations!



One more patch (the timezone in the logging was wrong and it did log in

gmt instead of localtime) later and now it looks quite good:



http://muensterland.org/statistics/



Nice. Next thing would be to find a way to do that per user. Maybe I just

split the stuff by user path and run single instances of webalizer, or I

just put in some grouping for some of the users.



> In other news, we almost have another search engine backend available:

>     http://www.myelin.co.nz/post/2003/3/13/#200303131



Fine! The context in search results is one thing I miss with swish++.



bye, Georg

Phillip Pearson: Re: [PyCS-devel] question: better logging (per user)

> But it works. I now have a nice and shiny combined log with remote host

> IPs, referrers and user agent informations, but it is created on the

> community server. And it uses all rewriting rules, so I get only

> normalized URLs (/users/xxxxxx/ stuff). This can be splitted by user and

> so I could set up webalizer to just sum up stuff for one user. Or do other

> nice things with that :-)



Congratulations!



In other news, we almost have another search engine backend available:

    http://www.myelin.co.nz/post/2003/3/13/#200303131



I just realised that ht://Dig has a number of classes using static member

variables that don't seem to be cleaned up properly, so I'm going to have to

change all that if we want to ever be able to do more than one search per

PyCS process (reloading _htsearch.so might help, but I bet I'd end up with

one hell of a memory leak).  Ahh, CGI ...



Cheers,

Phil :)

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!



> I already have a hack working (not yet checked in, though) that will

> patch the http_request objects in a way that they log in the combined

> log format (with referrers and user-agent info). I currently investigate

> how

> complicated it would be to get Apache pass on the client address in a

> header, so I could use that in the logging to replace the apache machine

> header.



Ok, it is now working. I have added a new vhostfrom rule to the

rewrite.conf.default and added several patches in pycs.py and

pycs_rewrite_handler.py. The main problem is, that medusa doesn't give a

nice way to specify what class to use for http requests. So to do all this

nicely, I would have to overload the full hierarchy and make changes to

several methods and classes. To prevent that (as that would likely break

with newer releases where the inner workings change), I just patch some

class objects with setattr. This will break with  newer versions, too, if

some key components change. But that's only very small code added, and

only actually one dependency on inner workings at all: I assume that

http_request objects have a header and _header_cache instance variable

like they do now.



So if someone want's to dig into the code, be warned. It is butt ugly ;-)



But it works. I now have a nice and shiny combined log with remote host

IPs, referrers and user agent informations, but it is created on the

community server. And it uses all rewriting rules, so I get only

normalized URLs (/users/xxxxxx/ stuff). This can be splitted by user and

so I could set up webalizer to just sum up stuff for one user. Or do other

nice things with that :-)



bye, Georg

Phillip Pearson: Re: [PyCS-devel] question: better logging (per user)

> > I only analyse what comes in from Apache, because that gives me the

> > client IP address.

>

> I am currently working out how to solve that, too. :-)

>

> I already have a hack working (not yet checked in, though) that will patch

> the http_request objects in a way that they log in the combined log format

> (with referrers and user-agent info). I currently investigate how

> complicated it would be to get Apache pass on the client address in a

> header, so I could use that in the logging to replace the apache machine

> header.



You could always continue the ~~vhost~~ thing and turn it into

~~vhost~~/ip.address/server/path ...



BTW this may be useful:

    http://httpd.apache.org/docs/mod/mod_headers.html



Now, can we get it to take input from mod_rewrite?  :-)



Cheers,

Phil

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!



> Nothing from my end.  In fact I totally ignore the logs coming out of

> the PyCS process ;-)

>

> I only analyse what comes in from Apache, because that gives me the

> client IP address.



I am currently working out how to solve that, too. :-)



I already have a hack working (not yet checked in, though) that will patch

the http_request objects in a way that they log in the combined log format

(with referrers and user-agent info). I currently investigate how

complicated it would be to get Apache pass on the client address in a

header, so I could use that in the logging to replace the apache machine

header.



This would allow me to create full combined logs for the machine and so

split that up to produce statistics for user directories with all

informations that would be available from the apache machine.



Actually I don't like running webalizer on the apache machine because

there it doesn't have the rewritten addresses. Since I use manila style

host names, I get a lot access to stuff like /weblog/index.html - but

can't tell wether that's for hugo.muensterland.org, witch.muensterland.org

or pyds.muensterland.org :-/



bye, Georg

Phillip Pearson: Re: [PyCS-devel] question: better logging (per user)

Hi,



> If you know of something that exists and might break with this change,

> notify me and I will have to make this logging behaviour configureable.



Nothing from my end.  In fact I totally ignore the logs coming out of

the PyCS process ;-)



I only analyse what comes in from Apache, because that gives me the

client IP address.



> Another change in CVS is that now there is the /status activated in

> medusa. It's only a simple status page and doesn't include too much

> information, but I think we should support it with our own handlers, in

> the long run. Might be a nice place for a quick glance on how your server

> performs.



Good point.  When I coded the server in the first place, I turned off

everything I didn't immediately need, because I was in a hurry and

didn't want to have to bother checking to make sure it was secure.



Then, I never went back to do the extra work and get it all going

again ... ;-)



So if you think /status is OK, I don't mind having that that turned

back on again.



Cheers,

Phil

Georg Bauer: Re: [PyCS-devel] question: better logging (per user)

Hi!



> I am unsatisfied on how PyCS currently does logging: it's all in one big

> file and _before_ rewriting takes place. This makes up for very ugly

> URLs when PyCS runs behind an Apache. My idea is to provide common log

> file format per user, but _after_ rewriting takes place.



I think I found it. pycs-rewrite_handler.py doesn't change the

request.request field on rewriting. I changed this so that it now

constructs a new request and put's it in there. This should work out

nicely, as it doesn't change anything else in the system, just the field

and code that depends on that (and that should - in my opinion - get the

rewritten address).



But this change (just checked it into CVS) might break stuff that depends

on the access.log written by pycs. So if you have a log analyzer working

on your pycs-generated access.log, things have changed and you won't find

original URIs in there. I checked Phils make_referer.py script, that reads

the apache log files and so isn't influenced by my change. But there might

be other stuff outside.



If you know of something that exists and might break with this change,

notify me and I will have to make this logging behaviour configureable.



Another change in CVS is that now there is the /status activated in

medusa. It's only a simple status page and doesn't include too much

information, but I think we should support it with our own handlers, in

the long run. Might be a nice place for a quick glance on how your server

performs.



bye, Georg