Hi!
> I take it that the offending bit is:
>
> # unquote path if necessary (thanks to Skip Montanaro for
> pointing
> # out that we must unquote in piecemeal fashion).
> if '%' in uri:
> uri = unquote (uri)
Yep, that's the bugger.
> IMHO this should become:
>
> if '%' in uri:
> path, qs = urllib.splitquery(uri)
> url = unquote(path) + qs
Hmm. It's in line 469ff - there is actually "request" in the code, nur
"uri" - are you using an older/newer version than 0.5.3?
path, qs = urllib.splitquery(request)
if '%' in path:
request = unquote(path) + qs
So we don't unquote if there is only quoting in the query? This should fix
the problem, I think. If Id idn't overlook some problem, that is ...
bye, Georg
OK, more on this.
I take it that the offending bit is:
# unquote path if necessary (thanks to Skip Montanaro for
pointing
# out that we must unquote in piecemeal fashion).
if '%' in uri:
uri = unquote (uri)
IMHO this should become:
if '%' in uri:
path, qs = urllib.splitquery(uri)
url = unquote(path) + qs
That should unquote the path, but leave the rest to do later. What do you
think?
Cheers,
Phil :)
On Tue, Mar 04, 2003 at 09:01:26AM +0100, Georg Bauer wrote:
> >We could always fix it ourselves and send a patch to the Medusa
> >maintainers - there seems to be a reasonable amount of activity going
> >on in that project, so I'm sure they'd be happy to hear from us ...
>
> Sure, we can. But I have to admit that I don't have an idea how to do
> that _right_, the only ideas coming up to me currently are bad and ugly
> hacks (like tearing the request apart, unquoting partial stuff,
> reconstructing it - must be the binary/textfile issues Hal pointed me
> to, those make my brain hurt ;-) ). But I am not sure that things won't
> break. Hmm. Do you have a nice idea? If yes, go ahead :-)
Hmm ... I'll take a look. I didn't think it was that hard --
basically, given an HTTP request:
>>> import urllib
>>> url = 'http://foo.com/bar/baz?' + urllib.urlencode((('baz','boz'), ('abc', 'a=b&c?d')))
>>> url
'http://foo.com/bar/baz?baz=boz&abc=a%3Db%26c%3Fd'
We can just split by &, then by =, then unquote to get the values:
>>> path, qs = urllib.splitquery(url)
>>> path
'http://foo.com/bar/baz'
>>> qs
'baz=boz&abc=a%3Db%26c%3Fd'
>>> bits = qs.split('&')
>>> bits
['baz=boz', 'abc=a%3Db%26c%3Fd']
>>> for bit in bits:
... key,value = urllib.splitvalue(bit)
... (key, urllib.unquote(value))
...
('baz', 'boz')
('abc', 'a=b&c?d')
That gives you all the bits out of the query string ... presumably
Medusa gets the rest right already ...
(BTW doesn't Medusa give us a copy of the full query string anyway?
In PyCS I think each script calls pycs_http_util to split it up ...)
Cheers,
Phil :)
BTW - here's the raw code for the above, if you want to hack around:
import urllib
url = 'http://foo.com/bar/baz?' + urllib.urlencode((('baz','boz'), ('abc', 'a=b&c?d')))
url
path, qs = urllib.splitquery(url)
path
qs
bits = qs.split('&')
bits
for bit in bits:
key,value = urllib.splitvalue(bit)
(key, urllib.unquote(value))
so I guess:
def urldecode(url):
path, qs = urllib.splitquery(url)
return [(key,urllib.unquote(value)) for key,value in qs.split('&')]
Hi!
There is a bug in medusa that creates problems for PyCS and PyDS when
passing URIs as parameters to handlers via GET methods.
Medusa unquotes the request in the http_server.py module in the
http_channel class in the found_terminator method. It unquotes the _full_
request line, not only the command and path parts. This produces problems
when one of your parameters you try to pass in is an URI, like is the case
with the counter script that creates the referer entries.
This is the reason why in the referer lists URIs only show their first
parameter. The problem is, the unquote removes the quote-protection from
the parameter values. Since we interpret the query part after the global
unquote, the before protected additional parameters of the passed in URI
now become parameters of the called URI.
I don't have a good idea how to fix this without touching medusa (which I
wouldn't like to do, as this complicates setup), and so contacted the
upstream author on it and left the bug in the system.
But if the upstream author doesn't come up with something, we will have to
fix that ourselves, as it really creates problems. Anyone of you with a
good idea?
bye, Georg