RSS 2.0 Feed

» Welcome Guest Log In :: Register

  Topic: The Blog Mirror Project, Adding memory for memory holes< Next Oldest | Next Newest >  
Wesley R. Elsberry

Posts: 4987
Joined: May 2002

(Permalink) Posted: May 11 2006,17:26   

Given the propensity for re-writing the record on various antievolution weblogs, I'm looking at means of archiving the moment-to-moment versions of posts and comments on weblogs.

One intriguing approach is the feedWordPress plugin for WordPress. This plugin allows one to point to an RSS feed and it will add a post to a WordPress weblog for each item in the feed. I've already set up a WordPress blog and the plugin to try this out. It isn't a perfect solution for the particular application yet, though, as I commented on that site:


I have a slightly different interest in the project. What I want to accomplish is to completely mirror current posts and commentary at a particular weblog, without updating past items if they are modified. Having new entries made would be OK when changes occur. So far, FeedWordPress is about as close a solution as I have seen, but the big things standing between what it is and what I want are: 1) incomplete posts/comments (it appears to store what comes across in the RSS feed and doesn’t get the more extensive original text) and 2) updating of items when changes are found in the feed (I want a log of what each version was).

If I assume that I get to make modifications to get what I want, then there are some things to do. Turning off updates will be specific to the FeedWordPress plugin. Beyond that, there is getting the full page rather than jsut the RSS description.

I'm thinking that perhaps the way to handle this is to produce a mirror of each version of a post. This may require a fair amount of programming, and possibly stepping outside of PHP. Breaking this down, there are several tasks: reading and parsing the original page, downloading each element of the page, relativizing links to elements in the page, creating an MHT archive of the collected elements of the page, and linking to the locally-stored copy of the MHT archive.

There is a PHP solution to making an MHT archive once the pieces are collected: MHT FileMaker.

A big step of the remaining task would be to match up a mirroring tool with this code. That would imply a tool that would do the retrieval and URL fixup, while putting all the elements into one directory. The PHP MHTFileMaker class could then be called to generate the MHT file from the files in the directory. It would be best if the main html file could be renamed as "index.html" by the mirroring software, so that the first file does not have to be tracked on a case by case basis.

So, basically, I wanted to put these problems out in front of people to see if anyone else hs suggestions or is moved to actually put in some programming work on this project.

"You can't teach an old dogma new tricks." - Dorothy Parker


Posts: 380
Joined: Aug. 2005

(Permalink) Posted: May 12 2006,03:12   

Sorry, but that is waaaaay out of my league.  I do know much work goes into a message board, but I am afraid I don't know how to do it!!

And thank you for all your efforts.

If I fly the coop some time
And take nothing but a grip
With the few good books that really count
It's a necessary trip

I'll be gone with the girl in the gold silk jacket
The girl with the pearl-driller's hands


Posts: 544
Joined: Jan. 2006

(Permalink) Posted: May 15 2006,12:27   

Is there such a thing as copyright protection for blogs?

if not, I guess it won't be long before some people complain loudly that there is. If so, it might be more ethical only to publish links to copies of modified posts rather than the entire content.

Also if not, I think in future I might link to the advert free copy rather than providing dumsbki and scrote with the "oxygen of publicity" that they crave  (shudders at thought of mrs T.)

Where are people like BarryA when you need them? :)

  2 replies since May 11 2006,17:26 < Next Oldest | Next Newest >  


Track this topic Email this topic Print this topic

[ Read the Board Rules ] | [Useful Links] | [Evolving Designs]