steve_h
Posts: 544 Joined: Jan. 2006
|
Searching for posts on the UD I & II posts is not as easy as it should be.
When Wesley add the following links to the bored mechanics thread, I thought my problems were solved. They present entire threads in two huge pages. - Complete UD I thread - Complete UD II thread
remove the initial X to get the real results (Warning may crash browser)
Since every browser I tried on windows XP struggled to display those pages. I realised that my problems were not solved after all.
So I tried to split the huge files into manageable chunks, one file per comment, named after the post number, date, and author
- surely the "find in files.." of so many editors or the indexing service would text care of things. Well, to an extent yes, but the HTML formattings got in the way somewhat, and the search results were pretty much unreadable.
I tried creating a new set of files with all of the HTML stripped out (No fonts, bolds, italics, links, images etc) and the result was a bit better but still not very readable.
Then I put the texts into an SQLITE database and wrote a simple TCL script to search it and display the results.
The result was still crap, because so much depends on knowing what is quoted and what isn't, so I arranged for <Q> and </Q> to be placed around the quoted stuff, then for the quoted stuff to be in different colors, then for the hyperlinked text to be hilited, and then for images to display (thumbnails only).
Finally I added double clicking of urls and images to paste the URLs to the clip board, and control-left-mouse-button to open in a browser (I.E hardcoded, but it's a script which you can edit).
It's fairly basic and it's unlikely to be improved unless I get very bored indeed. Here's a screenshot:
If you want to try it, you can download -the database atbc.zip (12.5MB), - the (optional) thumbnail views of most of the images thumbnails.zip (15.5MB) - and the browser script browse.tcl
You may also need/like to download: - A free TCL 8.5 interpreter from www.activestate.com]activestate.com. - The free open source sqlite3 database program sqlite3.exe from sqlite.org - Free open-source ZIP-compatible archiving software www.7-zip.org
Post bug reports, comments, improvements, missing images, copyright infringements here. If any of my HTML-stripping has completely altered the meaning of any comment, then obviously I would like to know about it. Also I've noticed some instances of characters such as ä/ö/ü/è being rather badly mangled. I don't need to know every page that contains character errors but examples of each bad character would be nice.
Disclaimer: I've only tried this on windows-XP so far. I may need to choose a unix friendly compression algoritm and/or tweak the TCL script.
Edit: Corrected thumbnail link Edit: b Edit: Corrected database link, atbc.zip
|