Screenshot a URL with Python and Qt and WebKit

I’ve spent the last 4 days working on the problem of how to make a screenshot of a URL on a Linux server with no display. Furthermore, I need it to be programmatic, so I could call it from a web app when needed. Additionally, I wanted some architecture that could conceivably lend itself to threading in the future, so that we could scale the thing.

Essentially, all the solutions I found did this:

  1. Create a headless X server with something like Xvfb.
  2. Open some sort of browser on that in-memory X window.
  3. Screenshot the contents of that browser.
  4. Render that screenshot content out to some image format.

There are several ways to do this with names like html2image, khtml2png, webkit2png, etc. but they all had issues with either my platform or my need to not pay for it :)

That’s when I found the work of Roland Tapken. His script and explanation were the solution I needed. It made nice screenshots, had the configuration options I needed (screen size, scaling image, destruction of X server, etc.). I thank him deeply for his work and his sharing it with the world.

The only problem off the bat was that it requires at least 2 packages that were not installed for me on Ubuntu 7.10 (Gutsy). In fact, they aren’t available in the versions I needed until the most recent Ubuntu 8.10 (Intrepid). Those packages are libqt4-webkit and python-qt4. Even on Intrepid (at least on my slicehost slice), I needed to install these and their dependencies. You’ll need to make sure to get libqt4-core as well. The libqt4-webkit didn’t seem to require it according to aptitude.

I had a small problem with the script as far as passing arguments into the Xvfb server. For that reason, I’m sharing my version of the code. However, all kudos go to Roland. His blog post on this subject provides a nice overview of the thought process and rationale behind this code. If you’d like a little more on the how and why, check it out. He also provides a simpler version if you’d just like to test out the Qt WebKit stuff and see how it works.

In summary, to use this code:

  1. Read Roland’s blog post.
  2. Install the necessary libraries: libqt4-core, libqt4-webkit, python-qt4, xvfb.
  3. Grab my code here. Change the extension to py.
  4. To run, use a command like this: python webkit2png.py -x -o cnn.png –debug http://www.cnn.com

If you have questions, comments, or code to add, please post in the comments or pingback this post. I will try to update once I have a good threading model built out around this functionality.

About these ads

  1. Thank you first of all for putting this up. This made it a bit easier on me than the Roland Tapken page.

    However it seems to be really a version thing I guess.

    On Debian Etch, forget it, libqt4-webkit is missing.

    On Debian Lenny all seems to install, however xvfb-run seems to want a different Syntax then what Roland has got in his script.

    Your script runs past that, however no cigar either.

    “ERROR:root:Failed to load http://www.google.de

    So I am no God in python but I will start readying through your script an try to find where I am hanging up and why, but maybe you can give me a few hints.

  2. NYinker

    Howdy,

    Don’t laught at me but.. how do you call it from webapp?

  3. Thank you for this script, very useful!

  4. Thank you, very much, I cant believe it was so easy. I installed it on debian squeeze in about 30 secs

    Regards

  1. 1 Reading ajax content programmatically » a Display of Patience

    [...] http://aezell.wordpress.com/2009/02/13/screenshot-a-url-with-python-and-qt-and-webkit/ Share: [...]




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s



Follow

Get every new post delivered to your Inbox.

%d bloggers like this: