Joined: June 2007
What I would do is:
1. Download only the html - images and such aren't required.
2. Track only the 'n' most recent threads, to save bandwidth and time.
3. Feed the html for each post and comment thread into a subversion repository, which will track changes. Possibly after being fed through a script to strip out everything except the post and comments (Ads, sidebar stuff, etc.), and detect 404s.
What you could also do is store posts and comments in a database by comment id, so that when a comment goes missing it can be pulled from the database with a simple query. This wouldn't track changes without some more work, but it would save deleted comments and posts.
To rebut creationism you pretty much have to be a biologist, chemist, geologist, philosopher, lawyer and historian all rolled into one. While to advocate creationism, you just have to be an idiot. -- tommorris