RSS 2.0 Feed

» Welcome Guest Log In :: Register

Pages: (2) < [1] 2 >   
  Topic: The computer thread, Vile silicon machinery< Next Oldest | Next Newest >  
Nerull



Posts: 317
Joined: June 2007

(Permalink) Posted: Dec. 04 2008,14:02   

What I would do is:

1. Download only the html - images and such aren't required.

2. Track only the 'n' most recent threads, to save bandwidth and time.

3. Feed the html for each post and comment thread into a subversion repository, which will track changes. Possibly after being fed through a script to strip out everything except the post and comments (Ads, sidebar stuff, etc.), and detect 404s.

What you could also do is store posts and comments in a database by comment id, so that when a comment goes missing it can be pulled from the database with a simple query. This wouldn't track changes without some more work, but it would save deleted comments and posts.

--------------
To rebut creationism you pretty much have to be a biologist, chemist, geologist, philosopher, lawyer and historian all rolled into one. While to advocate creationism, you just have to be an idiot. -- tommorris

   
  45 replies since Nov. 28 2008,05:16 < Next Oldest | Next Newest >  

Pages: (2) < [1] 2 >   


Track this topic Email this topic Print this topic

[ Read the Board Rules ] | [Useful Links] | [Evolving Designs]