From the technical side restoring the forum was a long process full of many late nights. Here is how it was done
The general process is:
1. Obtain a copy of an old topic as a web page.
2. Extract the post text, author, post dates, etc.
3. Rebuild the topics and posts in a database as those separate components.
4. Recreate the content in new forum software.
Let's break this down step by step:
1. Getting the old dataBelieve it or not this was the easiest part. There are a couple 'backups' of the xbox-scene forum out there that have a large portion of the 4,797,459 posts (thanks xboxexpert!) but sadly none are complete. So where do you get more? The Internet Archive's Wayback Machine of course! Using a tool for ripping entire websites from IA I retrieved
every capture the wayback machine had of forums.xbox-scene.com.
Okay we've got 1,887,080 HTML files. Now what?
2. Extracting the contentTo extract the post body, author, post date, topic title etc we have to commit the ultimate programming sin: parsing HTML with regular expressions.
Since the captures from IA and the various backups were from different points in time the page styling and underlying HTML were not consistent. This meant having to write ~20 different parsers to extract the necessary data. The final results were then stored in a database for easier processing.
3. Cleaning, sorting, and deduplicatingNow that I had post body, author, date, topic title, post ID (so helpful), etc the next step was deduplicating and weeding out obvious parsing failures. This was done with a series of scripts using some really nasty regex and resulted in 1,545,492 posts in 184,019 topics. Just 32% of the original forum content.
They say what you post on the internet is forever. Sadly, not in this case.
4. RecreationThe final step is turning this nicely laundered content into a usable forum. I chose to use Simple Machines Forum (SMF) for this as the database schema was incredibly simple to work with and after writing a couple functions I could easily inject user accounts, topics, and posts. The rest of the work was pretty straight forward; inject all the users, inject all the topics, inject all the posts. The most tedious process was manually creating the forum board structure and associating topics to their correct forums. (Surprise, there was more regex involved).
The final result was a functional forum that suddenly was looking awfully familiar!
To wrap it up I recreated the IPB theme of ~2005 era X-S as an SMF theme for maximum nostalgia.