Importing a large blog to WordPress.com: WXR splitting tools

I am about to import a very large WordPress blog (not this one) to WordPress.com.

There’s two issues:

1. The WXR (WordPress eXtended RSS) export from the site is 105MB uncompressed and 22MB compressed (with gzip -9). This is too large to upload to WordPress.com, which only accepts uploads of 15MB at most.

2. This site has 4000 media file uploads (and 6000 posts). The original host is going away: those 4000 media files (mostly images) must also be imported into WordPress.com.

The obvious solution to is to split the upload into multiple files, but I have just tested on WordPress.com, and in order to get it to change the post contents to refer to the imported copy of the media files, rather than the original externally hosted copy which is about to go away, the media file and the post must be uploaded in the same XML file. The scripts that I’ve found that will split WXR files into multiple XML files do not attempt to put media files and the posts that refer to them in the same XML file (eg mainSplit.py doesn’t do this), they just split the contents of the export file up in the order they appear.

Anyone got leads on this one?