BlogMatrix
 

Using Amazon S3 to serve static files

edit David Janes 2006-08-06 15:07 UTC  ·

I've just written a post over on the BlogMatrix blog on how I used Amazon S3 to serve BlogMatrix's static files (based on Adrian's original post). I've included a fairly flexible Python/S3 uploader which you can use in your own projects.

wxMozilla on Linux

edit David Janes 2005-02-03 16:22 UTC  ·

For the time being, Sparks! on Linux is going to have to remain our underprivilged child. Why? Because wxPython (the GUI application toolkit we are using) does not provide a very functional HTML widget for that platform. On Windows, we are embeding Internet Explorer using an Active X control. On Macintosh, we are using a Safari control (experimental in nature, but soon to get much better).

On Linux, there was a project called wxMozilla to bring Mozilla into wxWidgets as a standard control. Although the project appears to be abandoned I decided to spend 48 hours seeing if I could make it work. Alas, no. If anyone wants to help out, please get in touch. Here's what we were using (in pretty much the order they need to be compiled):

  • RedHat Fedora Core 3
  • Python 2.3.4
  • glib-2.6.1
  • atk-1.9.0
  • pango-1.8.0
  • gtk+-2.6.1
  • wxWidgets from CVS (2005.02.02; post 2.5.3.1). This successfully builds a wxPython installation on our Linux development machine.
  • mozilla from CVS (2005.02.01). This successfully builds a Mozilla browser.
  • wxMozilla-0.5.3. This fails even in the simplest demo program. The problem may lie in how GTK 2 needs to be initialized.

Apple, may your camels each be infested with 10,000 fleas

edit David Janes 2004-12-03 21:08 UTC 1  comment  ·

Jäger's podcasting feature is messed up with iTunes 4.7. Why? Because Apple changed the COM interfaces from 1 indexing to 0 indexing.

"pytunes" now has a lot of code that looks like this:

def __iter_tracks(self):
 for current in xrange(0, len(self.com_playlist.Tracks) + 1):
  try:
   yield Track(self, self.com_playlist.Tracks[current])
  except IndexError:
   continue

There'll be a release tonight that fixes this.

GenericThread

edit David Janes 2004-10-14 14:27 UTC  ·

GenericThread is a part of BlogMatrix Jäger's "generic" library providing generally useful functions. GenericThread is a wrapper around python's "threading.Thread" that provides a few useful extra functions:

  • passing of arguments when starting
  • pythoncom initialization, for applications which use Win32 COM objects
  • methods which are called before and after the main body of the thread, for initialization and teardown
  • the ability to gracefully halt thread operations

You can download GenericThread here.

Here's an example of using GenericThread:

import time
import GenericThread

class MyThread(GenericThread.GenericThread):
 def __init__(self):
  # the named parameters are optional
  GenericThread.GenericThread.__init__(self, is_daemon = False, is_com = False)
  
 def CustomizeStart(self, a, b, c):
  # the command line arguments are arbitrarly defined -- you can as many as you want
  print "CustomizeStart.CustomizeStart: called -- we're in the thread, starting up"
  
 def CustomizeFinished(self, a, b, c):
  print "CustomizeStart.CustomizeFinished: called -- we're in the thread, shutting down"
  
 def CustomizeRun(self, a, b, c):
  print "CustomizeStart.CustomizeRun: called -- we're running: do your work here"
  
  for i in xrange(10):
   self.CheckHalt()
   time.sleep(.5)
   print i, a, b, c
  
thread = MyThread()
thread.Start("a", 4784, [ 1, 2, 3 ])

pytunes

edit David Janes 2004-10-11 16:12 UTC 5 comments  ·

'pytunes.py' provides a generalized interface for controlling media players, such as Apple's iTunes. This implementation only works with iTunes on Windows but we expect to have a Macintosh implementation later this week and hope to demonstrate a controller for Windows Media Player also (if anyone can help me with the COM interfaces here, I'd be very grateful). This version concentrates on manipulating the playlists, adding songs and so forth rather than playing music, controlling the volume, etc. as this is my immediate need for the Podcasting version of BlogMatrix Jäger.

import pytunes
itunes = pytunes.iTunesWindows()

# list everything in the Playlist 'Recently Played'
playlist = itunes.GetPlaylistByName('Recently Played')
for track in playlist.IterAllTracks():
 pprint.pprint({
  "album" : track.GetAlbum(),
  "title" : track.GetTitle(),
  "composer" : track.GetComposer(),
  "artist" : track.GetArtist(),
 })
 
# list songs with the word 'Love' in the Title
for track in library_playlist.SearchTitles("Love"):
 pprint.pprint({
  "album" : track.GetAlbum(),
  "title" : track.GetTitle(),
  "composer" : track.GetComposer(),
  "artist" : track.GetArtist(),
 })

[Python / OPML] Jaeger's OPML Parser

edit David Janes 2004-10-07 19:58 UTC  ·
David Janes has built some pretty cool tools in the process of making Jaeger, his Python/wxPython-based newsreader. One of these tools, an

Refactoring Synchronization

edit David Janes 2004-10-02 22:26 UTC  ·

I've been doing a substantial amount of upgrading to the Synchronization code in Jäger over the last couple of days. The impetus for this is our upcoming Bloglines integration (coming in the next 3 days), but in fact the code was a squirrelly mess and needed some refactoring anyway. I've made the UI a little easier to understand also, I hope.

Synchronization is now to a couple of modules in the source tree. A number of modules that were only partially or not at all used have been removed. Here's what's left:

Modules that describe the settings, handle communications, and so forth:

  • BlogSync.py
  • BlogSyncBloglines.py
  • BlogSyncFTP.py

Modules that invoke synchronization:

  • BlogSyncDialog.py (File > Synchronization > Synchronize Now...)
  • BlogManagerMixinSync.py

Modules that let the user set up synchronization (File > Synchronization > Settings...):

  • BlogSyncPreferences.py
  • BlogSyncPreferencesBloglines.py
  • BlogSyncPreferencesFTP.py

I won't spend any time describing the code itself here. I've added a lot of documentation internally and the object-oriented nature of the code should make it pretty easy to follow.

Not until next week

edit David Janes 2004-09-24 16:38 UTC  ·

I can't create an executable version of Jäger due to some mismatch between py2exe and wxPython 2.5.2.8 (it worked with the previous version of wxPython). If anyone has a clue, please let me know:

Traceback (most recent call last):
  File "BlogJaeger.py", line 61, in ?
  File "wxPython\__init__.pyc", line 10, in ?
  File "wxPython\_wx.pyc", line 3, in ?
  File "wxPython\_core.pyc", line 15, in ?
  File "wxPython\wx.pyc", line 2, in ?
  File "wxPython\wxc.pyc", line 9, in ?
  File "wxPython\wxc.pyc", line 7, in __load
ImportError: DLL load failed: The specified module could not be found.

If this can't be sorted out, I'm going to "downgrade" my version of wxPython and continue from there.

Universal Search Parser - 0.1.1

edit David Janes 2004-09-17 17:30 UTC  ·

I've made some small updates to the USP:

  • The example no longer needs the 'text = ' parameter to search. It knows the right thing to do.
  • I've added an ESPN seacher: 'search:espn type:news buffalo bills flutie chargers pages:all'
  • I fixed a bug in the scraper that was doing entity decoding within attribute values.

Oops

edit David Janes 2004-09-17 15:04 UTC  ·

There's a slight problem in my examples. You should use for now:

import Search
for result in Search.search(text = 'search:ebay TRS-80 pages:all'):
 pprint.pprint(result)

The next version will handle the argument to search correctly (it's possible to pass in a highly structured object that exactly describes the search).

Update: fixed.

Jäger: the support source code

edit David Janes 2004-09-15 23:07 UTC 1  comment  ·

The support source code for Jäger is now available (licensed under the LGPL), http://jaeger.blogmatrix.com/source/. This encompasses three libraries, each in their own directory:

  • generic: general helper classes and functions
  • druecken: the Drücken HTML downloading and scraping libraries
  • search: the Universal Search Parser

We will describe the files in "generic" and "druecken" at a later date. Some of this code is very useful, some less so.

The Universal Search Parser (USP) is an attempt to provide a consistent extensible method to utilize online search resources within a Python program (and in the future, as RSS or RDF results). It's quite easy to use; for example, here's how you can get all the listings for TRS-80s available on EBay:

import Search
for result in Search.search('search:ebay TRS-80 pages:all'):
 pprint.pprint(result)

Yielding:

{'Bidders': 0,
 'BuyItNow': u'$15.00',
 'Price': u'$10.00',
 '_link': 'http://cgi.ebay.com/ws/...1247&item=5122154379&rd=1',
 '_title': u'Sol 20, Exidy, TRS-80, NorthStar, BASIC, Assembler, etc'}
{'Bidders': 4,
 'Price': u'$15.50',
 '_link': 'http://cgi.ebay.com/ws/...=74947&item=5123221664&rd=1',
 '_title': u'RADIO SHACK TRS-80 POCKET COMPUTER MODEL PC-4 W/CASE'}
{'Bidders': 0,
 'Price': u'$0.99',
 '_link': 'http://cgi.ebay.com/ws/...=74947&item=5123225351&rd=1',
 '_title': u'Downland for TRS 80 Color Computer'}
...

What else can we search? Right now, we have the following modules implemented (there are example searches within each file):

  • SearchAmazon.py (requires pyamazon)
  • SearchBBC.py
  • SearchCBC.py
  • SearchCNN.py
  • SearchCanada411.py
  • SearchEbay.py
  • SearchGoogle.py
  • SearchIMDB.py
  • SearchSourceforge.py
  • SearchTechnorati.py (requires pytechnorati)
  • SearchWhitepages.py
  • SearchYahoo.py

To install the USP (and all the other libraries), do the following:

  • download the latest copy of "jaeger-support-*.tar.gz" file
  • unpack it
  • "cd search"
  • try "python Search.py 'some query'"

The main source code for Jäger will be along later this week or by Monday at the latest. The USP will be integrated into Jäger for both "immediate" and "persistent" searching in the very near future.

If you give this code a try and you like it (or hate it), please send along an e-mail (or even send a donation): I'd love to hear from you in any case. If there's any other Search modules you'd like implemented, send me a note and I'll see what I can do, or if you're so inclined try it yourself – it's not too difficult.

The Universal Search Interface

edit David Janes 2004-09-11 05:00 UTC  ·

About two months ago, I added the ability to search blog postings within Jäger. You first saw this in 1.3 beta releases. As I thought about this more and more, I released there was something much more general purpose here: why not search anything with Jäger? I quickly implemented an Amazon interface using the pyamazon module and was quite pleased with the results. Each Amazon category – books, dvd, video, and so forth – was treated as a blog and each entry was a particular search result. I decided not to put this in the official Jäger 1.4 release because I though I had the beginning of something much more powerful here that needed to be done correctly.

Note that I'm talking about something orthogonal to what JWZ is mentioning here. I'm not talking about using RSS to return changing results from persistent search, though I think that's a great idea and can be easily implement with the libraries I'm releasing to you. I'm talking about using RSS to return search results – i.e. something entirely ephemeral in nature; you look at the results, then discard them.

To do this, I created something I'm calling the Universal Search Interface. It's a Python library for searching ... anything. It's built on top a very powerful and easy to use scraping library called "Drücken" which lets me scrape ... well, almost anything with regular output. It doesn't have to use scrapers: it uses the pyamazon and pytechnorati libraries for accessing Amazon and Technorati.

But enough babbling from me. Here's the basic code (from a user's perspective) to do a search (each result is a single search element):

import Search
for result in Search.seach(text = 'something'):
 pprint.pprint(result)
Here's a few example search strings:
  • search:Amazon buffy the vampire slayer type:dvd
  • search:Google type:images "dan rather"
  • J Janes search:Canda411 state:NL city:"St. John's" pages:2
  • Dan Smith search:Whitepages state:NY city:"New York" pages:all

The only constant element here is the 'search:'. Every element with a colon in it is a 'restriction'. The restrictions that the USI directly recognizes are 'search:', 'type:' and 'pages:'. 'search:' allows the USI locate the searching class; 'type:' narrows the search to a particular sub-service of a search engine; and 'pages:' tells the maximum number of pages of search results that can be retrieved from the particular search service. The default is '1'; obviously the meaning of a page is highly dependent on the search engine being used. There may be other restrictions added called 'language:' and 'template:'.

Note also that the search interface is implemented as an iterator (using generators, actually). Thus search results must be retrieved starting at the very first result! Also note that searches like 'search:Google dog' may potentially retrieve hundreds of thousands of results which is very nasty. However, results are returned as soon as they're available, which is not only handy, but essential.

Here's some example output from the USI (for the Canada 411 search):

{'Address': u'39 Goldeneye Pl',
 'City': u"St John's",
 'Country': 'CA',
 'FirstName': u'J',
 'LastName': u'Janes',
 'Name': u'J Janes',
 'Phone': u'(709) 747-0979',
 'PhoneURI': u'tel:+1-709-747-0979',
 'State': u'NL',
 '_link': u'http://findaperson.canada411.ca/more_info/...',
 '_title': u'Janes, J'}
{'Address': u'8 Lynch Pl',
 'City': u"St. John's",
 'Country': 'CA',
 'FirstName': u'J',
 'LastName': u'Janes',
 'Name': u'J Janes',
 'Phone': u'(709) 722-8327',
 'PhoneURI': u'tel:+1-709-722-8327',
 'PostalCode': u'A1B 4L8',
 'State': u'NL',
 '_link': u'http://findaperson.canada411.ca/more_info/...',
 '_title': u'Janes, J'}
...

The rules for the output format are quite simple:

  • the only valid values in a result are Unicode strings, integers, floats, lists and dictionaries, with the later two being discouraged but not prohibited. Non-Unicode strings and classes or "bags" are not allowed
  • names start with a underscore are reserved. The reserved names currently in use are '_title', '_link', '_html' and '_text'

The source code for this (and Jäger's "generic" library, which this depends on) will be released next week under standard Python source code license. The rest of Jäger will be released the week after under a different license, the details of which I'm still working on.

So, what does this have to do with RSS search results?

Well, there's another layer coming called the "Pylot interface". Pylots are little Python webservices that you can plug into a Pylot Engine. Jäger will be one of these, though there's no reason these can't be a different freestanding application. The idea I have is that there'll always a Python environment running on your desktop (which is what Jäger is) that you can access as a local webserver (maybe using twisted), a database such as MySQL if it's available, full access to the wxPython library and so forth. You want to do something? Just drop a piece of Pylot code in the correct directory and it's executing like a Windows application!

One possible idea for a Pylot is a front end to the Universal Search Engine that can return HTML, RSS 2 or even RDF results. Because it's on your own desktop and serving only 127.0.0.1, there's no worries about various terms of use that a public webserver would have. If the Pylot environment has MySQL, it's easy to implement JWZ's search engine result interface.

On the subject of RDF, perhaps one of you folks have some suggestions about how I could best return RDF results? It seems to me that this would be great for you semantic web types and could bootstrap your projects quite a bit. Does each USI class need to return a dictionary of what the terms mean, or can I just make up vocabularies ad-hoc?

Anyway, I'm way ahead of myself now. You'll see the USI on Wednesday.

Patching wxMutexInternal

edit David Janes 2004-06-28 23:29 UTC  ·

Last week I was having a pretty serious problem with wxPython on the Macintosh. The following block of code in wxPython is just plain wrong and after a few hours of operations the m_waiters array would be corrupted because of multiple threads contending for the memory.

If you're running wxPython 2.5.1.5, seriously considering patching your library as per these instructions.

while ( m_owner != kNoThreadID && m_owner != current)
{
 m_waiters.Add(current);
 ::SetThreadStateEndCritical(kCurrentThreadID, kStoppedThreadState, m_owner);
 ::ThreadBeginCritical();
}

It should read (as far as I can tell)

while ( m_owner != kNoThreadID && m_owner != current)
{
 ::SetThreadStateEndCritical(kCurrentThreadID, kStoppedThreadState, m_owner);
 m_waiters.Add(current);
 ::ThreadBeginCritical();
}

I knew that rebuilding wxPython libraries from scratch could end up being more trouble than it's worth, so I decided to take a different approach. Last Friday night I did an extending hacking session on wxPython's libraries. I downloaded Apple's Xcode developer tools and disassembled (using objdump) the broken module. Looking at the PowerPC opcodes, I figured one little tweak would do the trick. I needed to change this:

675f8:   38 a0 00 01     li      r5,1
675fc:   4b fb 17 39     bl      [wxBaseArrayLong::Add]

67600:   38 60 00 01     li      r3,1
67604:   38 80 00 01     li      r4,1
67608:   80 be 00 00     lwz     r5,0(r30)
6760c:   48 01 30 69     bl      [::SetThreadStateEndCritical]

67610:   48 01 30 a5     bl      [::ThreadBeginCritical]
to this:
67610:   48 01 30 a5     bl      [::ThreadBeginCritical]

675f8:   38 a0 00 01     li      r5,1
675fc:   4b fb 17 39     bl      [wxBaseArrayLong::Add]

67600:   38 60 00 01     li      r3,1
67604:   38 80 00 01     li      r4,1
67608:   80 be 00 00     lwz     r5,0(r30)
6760c:   48 01 30 69     bl      [::SetThreadStateEndCritical]

Note that I had to figure out those symbolic names — the objdump program didn't do that for me. Here's a Python program to make the fix:

old_data = [
 0x38, 0xa0, 0x00, 0x01, 
 0x4b, 0xfb, 0x17, 0x39,  # wxArrayLong::Add
 0x38, 0x60, 0x00, 0x01,
 0x38, 0x80, 0x00, 0x01,
 0x80, 0xbe, 0x00, 0x00,
 0x48, 0x01, 0x30, 0x69,  # ::SetThreadStateEndCritical
 0x48, 0x01, 0x30, 0xa5,  # ::ThreadBeginCritical
]

new_data = [
 0x48, 0x01, 0x30, 0xa5,  # ::ThreadBeginCritical
 0x38, 0xa0, 0x00, 0x01, 
 0x4b, 0xfb, 0x17, 0x39,  # wxArrayLong::Add
 0x38, 0x60, 0x00, 0x01,
 0x38, 0x80, 0x00, 0x01,
 0x80, 0xbe, 0x00, 0x00,
 0x48, 0x01, 0x30, 0x69,  # ::SetThreadStateEndCritical
]

fio = open('build/Jaeger.app/Contents/Frameworks/libwx_base_carbond-2.5.1.0.0.dylib', 'rb+')
fio.seek(start)

f_in = fio.read(len(old_data))

for i in xrange(len(old_data)):
 if old_data[i] != ord(f_in[i]):
  print >> sys.stderr, "data did not match"
  print >> sys.stderr, "%d: %02x %02x" % ( i, old_data[i], ord(f_in[i]) )
  sys.exit(1)

fio.seek(start)

for i in xrange(len(new_data)):
 fio.write(chr(new_data[i]))
 
fio.close()

There was one other piece of code causing a crash in Jäger. My advice — never never ever try to modify wxPython objects from outside their thread: it ain't ever going to work.

wxMutexInternal bug?

edit David Janes 2004-06-18 15:29 UTC  ·

In the Mac version of Jäger, I'm randomly getting crashes from deep within the wxPython code. The application run for several days without a crash, but crash it will eventually.

This is what I think the problem is but I'd like your feedback:

src/mac/thread.cpp

class wxMutexInternal
{
public:
    ...
    wxArrayLong m_waiters ;
    ...
};

wxMutexError wxMutexInternal::Lock()
{
    wxMacStCritical critical ;
    if ( UMASystemIsInitialized() )
    {
        OSErr err ;
        ThreadID current = kNoThreadID;
        err = ::MacGetCurrentThread(&current);
        // if we are not the owner, add this thread to the list of waiting threads, stop this thread
        // and invoke the scheduler to continue executing the owner's thread
        while ( m_owner != kNoThreadID && m_owner != current)
        {
            m_waiters.Add(current);
            err = ::SetThreadStateEndCritical(kCurrentThreadID, kStoppedThreadState, m_owner);
            err = ::ThreadBeginCritical();
        }
        m_owner = current;
    }
    m_locked++;

    return wxMUTEX_NO_ERROR;
}

You'll notice that the first time through, m_waiters is modified outside the critical section. This same variable is modified in the Unlock code within a critical section, indicating to me that this is the problem. Thoughts?

This is really a dealbreaker for shipping a super reliable version (i.e. non-Beta) version of Jäger. I had another idea of restarting Jäger whenever this happens, but there seems to be no way of trapping the EXC_BAD_ACCESS thrown by the Mach kernel like a normal UNIX bus error. Or is there?

David Janes' Python Adventures

edit David Janes 2004-06-10 20:01 UTC  ·

David Janes is driving himself crazy so you don't have to: in the process of building a Mac version of his Python/wxPython-based news aggregator, Jäger, he's managed to bump into a number of annoyan...

How to do it

edit David Janes 2004-06-09 23:36 UTC  ·

Here's some code from a Python module I have called "GenericPlatform.py". I'm not proud of this code – it's something of a hack – but it made porting Jäger to the Mac very easy for me without a lot of rewriting.

wxPython expects strings to be in the "native" 8 bit character set. Since I didn't want to make checks for what this character set is all over my code, I just defined my own character set called "native_8":

called = False
if not called:
 called = True

 import encodings.aliases
 if wxPlatform == '__WXMAC__' :
  encodings.aliases.aliases['native_8'] = 'mac_roman'
 else:
  encodings.aliases.aliases['native_8'] = 'latin_1'

Encoding characters is slightly ugly so I defined a number of helper functions to do this for me:

from GenericPlatform import _N, _L, _P

When I need a string in the native format:

 button = wxButton(self, -1, _N(u'J\xE4ger'))

I have a large amount of my data in my database that's in Latin-1 format. If you're storing 8-bit data, try to stick to one character set; if you need more than one, use Unicode. Hell, if I was to do it again I'd just use Unicode strings, so keep that in mind if you're starting a new project.

Anyhoo, here's the rest of my helper code from GenericPlatform. This is great for Windows code going to the Mac. If you're going to Mac to Windows you'll probably want to tweak for what you consider the default string character encoding is:

#
# Turn into the native representation
# - this assumes strings are in Latin-1 and is really for code
#   level compatibility
#
def _N(s):
 if not s:
  return ""
 elif type(s) == types.UnicodeType:
  return s.encode('native_8', 'replace')
 else:
  return s.decode('iso-8859-1').encode('native_8', 'replace')

#
# Turn into the unicode representation
# - this assumes strings are in Native-8
#
def _U(u):
 if not u:
  return u""
 elif type(u) == types.UnicodeType:
  return u
 else:
  return u.decode('native_8')

#
# Turn into the latin-1 representation
# - this assumes strings are in Native-8
#
def _L(s):
 if not s:
  return ""
 elif type(s) == types.UnicodeType:
  return s.encode('latin-1', 'replace')
 else:
  return s.decode('native-8').encode('latin-1', 'replace')