Georg Bauer commented on my hack P376 yesterday to say:
"your hack has disabled conditional GET completely, so you now pull down feeds every hour fully. This might get you some angry comments if somebody finds out ;-)"
I decided to investigate further. The problem boils down to this code in DownstreamTool.open_http:
def open_http(self, url, data=None):
numheaders = len(self.addheaders)
self.isHTTP = 1
self.lastURL = self.getTheUrl(url)
try:
theurl = self.getTheUrl(url)
self.message = _('opening url: <a href="%s">%s</a>') % (theurl, theurl)
if not(self.force):
for h in self.cache._getUrlHeaders(theurl):
apply(self.addheader, h)
self.message += _('<br>adding Header "%s: %s"') % h
urlpieces = urlparse.urlparse(url[1])
url = (urlpieces[1], url[1])
res = urllib.URLopener.open_http(self, url, data)
self.message = self.message.replace('%', '%%')
<snip>
It turns out that this code is DIFFERENT to the same code on my old PyDS install although BOTH are supposed to be 0.7.2 (they both say it at the top of the screen). The two lines:
urlpieces = urlparse.urlparse(url[1])
url = (urlpieces[1], url[1])
have been added in the Gentoo version. The problem seems to be that the variable url may be either a simple string or a tuple and the new code is assuming that it is a tuple when it is actually a string. I don't know precisely what the new code is supposed to be doing but taking it out fixes the problem.
Georg mentioned that I am repeatedly downloading articles and I did notice this effect but I was seeing duplicate entries before (particularly from the BBC) so I was accustomed to skipping past them.
Still, why is the PyDS version 0.7.2 in Gentoo different from my old version of 0.7.2 (which came from Georg's debian package)???
|