Migration Project
Overall goal: move as many old wikitext projects to markdown with folder structure as possible. Migrate campaigns from the bertball wiki to the campaigns folder.
Phase 0: Preparation
- [x] Identify which projects should be moved (e.g. which campaigns within Bertball belong in /campaigns, which documents should be in /faerun, etc.)
- [x] Define the folder structure standards for systems, settings, and campaigns
- [x] Determine organizing structure for campaigns (era-based?)
- [x] Delete wikiroots that shouldn't be here and move them to an archive (see Delete List)
- [x] Rename wikiroots that should be kept but need a new name (see Rename List)
Folder Structure Standards
Campaigns
For campaigns that don't already have a folder structure, the following default should be used:
- /background (optional, used to add worldbuilding and campaign history that isn't already covered in the setting documentation; can also be a top level file called background.md if it fits)
- /characters
- /main (PCs and important NPCs)
- other folders as needed; can be region-based or episode-based
- /episodes
- one folder per episode, named with the episode ID (e.g. ABC1-1, DEF2-3, etc.)
- each episode folder has an index.md describing the episode and linking to sessions
- /sessions
- one file per session, named with the session ID (e.g. ABC1-1a, DEF2-3b, etc.)
- /stories (optional, used for fiction taking place outside the session structure)
Naming conventions:
- Every campaign has a name with an abbreviation, preferably three letters. Examples:
- LND: Legends Never Die
- HFR: Heroes for Rent
- Episodes belong to a series, and both are incremented integers. For example, the first episode of the first series is 1-1, whereas the 5th episode of the 3rd series is 3-5. The episode name is separate and can be whatever you want.
- Sessions are incremented lowercase characters. The first session is a, the second b, etc.
- Examples:
- COT1-3a: Campaign "Coming of Twilight", Series 1, Episode 3, Session a
- SOL2-5d: Campaign "Story of a Lifetime", Series 2, Episode 5, Session d
Settings
For settings that don't already have a folder structure, the following default should be used:
- /places
- /cities
- /realms
- /adventuring-sites
- /people
- /ethnicities
- /nobility-and-royalty
- /organizations
- /religions
- /things
- /coins and commerce
- /magic items and artifacts
- /technology
- /time
- /calendars
- /history
- /campaigns
- one folder per campaign that takes place in this setting, with the same structure as described in the Campaigns section above
Systems
For systems that don't already have a folder structure, the following default should be used:
- /character (rules for character creation and advancement)
- /advancement
- /ancestry-and-background
- /attributes
- /basics
- /classes (when present)
- /equipment
- /feats
- /powers
- /skills
- /enemies (monster manual, rules for monster creation, etc.)
- /system (rules for gameplay)
- /combat
- /conditions
- /encounters
- /formulas
- /exploration
- /other (for rules that don't fit into the above categories)
Delete List
Delete all of these wikiroots and put them in a separate archive:
- /campaigns
- /exp
- /fr
- /pda2
- /uw
- /lnf
- /settings
- /BJH
- /City of Splendors
- /loretan
- /loretan-old
- /*horizons
- /space drow
- /systems
- /2025 03 01
- /Horizons
- /Legends
- /Next
- /Sphere
Rename List
Needs a new name:
- /campaigns
- /New Space Game
- /New Vegas
- /Untitled Space Game 2023
Phase 1: Wikitext to Markdown
- [ ] Convert all identified wikitext documents to markdown format, keeping documents in the same location. Guidelines:
- [ ] Wikitext documents should already be in standard wikitext format (circa 2007-2010), so most conversion should be straightforward.
- [ ]
- [ ] Where advanced mediawiki-specific features such as templates, parser functions, etc. are found, do not attempt to duplicate Mediawiki functionality. Instead, replace any function calls in the form of key-value pairs with a simple table. Embed codes should become simple links to the embedded document. Any other wikitext features that don't have a clear markdown equivalent, or would require significant engineering effort to replicate, should be ignored and left as-is in the markdown (e.g. {{Infobox character}} can just be left as-is in the markdown, without trying to convert it to a custom React component or something like that).
- [x] Convert all document filenames to be URL-friendly (lowercase, spaces replaced with hyphens, no characters invalid in a URL, etc.).
- [x] For internal document wikilinks, use the Ursa markdown link format, i.e. [[Foo]] becomes Foo. Note: previous requirement to convert to URL-friendly names means that the link target should also be URL-friendly, e.g. [[Foo Bar]] becomes Foo Bar.
- [ ] For Talk pages, convert to regular pages with url friendly names (i.e. change the colon to something else), and add link to the original page (at the top) to its talk page.
- [ ] For images, if wikitext specifies an image size (e.g. [[Image:Foo.jpg|250px]]), use the Pandoc/R Markdown syntax for image sizing, i.e.
{width=250px}. If no size is specified, just use. TODO: implement this extension in Ursa - [ ] For REDIRECT pages, just render directly, i.e. #REDIRECT [[Foo]] becomes a "Redirect: Foo" in markdown.
- [ ] Mediawiki discourages the use of H1 headers, so most wikitext documents will have only h2/h3/etc. If no h1 header is present, promote all headers in the document. Thus "==Foo==" becomes "# Foo", "===Bar===" becomes "## Bar", etc. If an h1 header is already present, keep all headers as-is (besides converting them to MD).
- [ ] Wikitext tables: Convert
{| ... |- ... || ... |}syntax to GFM pipe tables. Drop HTML attributes (class=,style=,bgcolor=,cellspacing=, etc.) that have no markdown equivalent. For simple key-value infobox tables, a two-column markdown table is fine. - [ ] Bold/italic:
'''bold'''โ**bold**,''italic''โ*italic*. - [ ] Headers:
=H1=โ# H1,==H2==โ## H2,===H3===โ### H3, etc. Strip the trailing=signs as well. - [ ] Bullet lists:
*โ-,**โ-,***โ-. Preserve nesting depth. - [ ] Categories:
[[Category:SomeCategory]]lines at the bottom of pages should be not be rendered in md. Add frontmatter attributes, e.g.categories: Fooorcategories: \n - Foo\n - Barif there are multiple categories. - [ ] External links:
[http://example.com/ Link Text]โ[Link Text](http://example.com/). Bare bracketed URLs[http://example.com/]โ<http://example.com/>. - [ ] Piped wikilinks:
[[Page Name (Qualifier)|Display Text]]โ[Display Text](./page-name-qualifier.html). The link target should be URL-friendlified as usual. - [ ] Indented text / "Main Article" pattern: Lines starting with
:(wikitext indent) should have the:stripped. The common pattern:''Main Article: [[Foo]]''โ*Main Article: [Foo](./foo.html)*. - [ ] Inline HTML: Strip wiki-specific tags (
<includeonly>,<noinclude>,<nowiki>,<div class="...">with wiki-specific classes). Preserve simple HTML that markdown supports (<br>,<img>) or convert where possible. Raw<img src="...">tags can be left as-is. - [ ] Namespace-prefixed files: Handle files with MediaWiki namespace prefixes:
Template:*files โ delete (template definitions won't be replicated)Category:*files โ delete (category pages are typically empty)Image:*files โ delete (these are empty placeholder pages, not actual images)User:*andUser talk:*files โ deleteFict:*files โ rename to strip theFict:prefixSpecial:*files โ evaluate case-by-case; most can be deleted
- [ ] Empty files: Delete any files that are completely empty or contain only whitespace.
- [ ] Encoding artifacts in filenames: Fix filenames containing mojibake patterns like
รยขรขโยฌรขโยข(mangled UTF-8 for'),รฦsequences, etc. Rename to the correct Unicode characters or simplified ASCII equivalents. - [ ] Template parameter syntax: Files containing
{{{parameter}}}(triple-brace template variable definitions) are template definition files and should be deleted along with otherTemplate:*files.
- [ ] Build a test bot:
- Run 'ursa serve docs'
- Run a bot script that uses curl (or similar) to fetch URLs and analyze the content.
- For each iteration, start at the root (localhost:8080 by default).
- Crawl all links on the page (if they are on localhost:8080), and for each link:
- Check if the link is valid (i.e. returns a 200 status code). If not, log the broken link.
- Check if the link URL is url friendly (all lowercase, dashes instead of spaces, no url-unfriendly characters). If not, log it.
- Load the link's URL. If 200, great, otherwise log it. Valid links should go into the queue to be crawled as well.
- The bot should have a queue of URLs to crawl, which is added to as new links are found. The list should be a Set (implementation of your choice) so duplicates cannot exist.
- Status of a URL: pending, in progress, complete, failed
- The bot should also keep track of which URLs have already been crawled, and which are in progress. In either such state, the bot should not attempt to crawl that URL again.
- The queue and the list of crawled URLs should be tolerant to parallel processing. When changing either, lock it while changing. When attempting to change either, check for a lock, and wait a reasonable time for the lock to be released if it is locked. If it takes way too long, abort with a warning. This is all local FS locking, no need for a distributed solution.
- After each successful queue/log change and lock release, persist to disk. Keep it in memory for normal processing, but persist to disk so that if the bot crashes or is stopped, progress is not lost.
- At any given time, if the queue has no pending URLs, but some in progress, the central thread will wait. If there are no pending or in progress URLs, the bot will exit and print a report of all broken links and non-url-friendly links.
- [ ] Find all extensionless images, determine their correct extension by looking at the file (hopefully this can be determined by analyzing the first few bytes, not using LLM or computer vision), add the correct extension, and update all links to those images to use the correct extension. Image filename and links should also be updated to be URL-friendly in the same manner as documents.
- Use the "file" linux command or equivalent. Example:
file --brief --mime-type "$f"will return the mime type of the file, which can be used to determine the extension. Fallback: write a script that reads the first few bytes of the file and matches against known magic numbers for common image formats (e.g. JPEG files start withFF D8 FF, PNG files start with89 50 4E 47, etc.).
- Use the "file" linux command or equivalent. Example:
Phase 2: Content-Aware Migration
Implementation Plan
Phase 2a: Heuristic Pre-Classification (no LLM) โ COMPLETE
Built classify-wiki.js (~1,060 lines) + classify-yaml-lite.js (~70 lines). Ran on all 18 wikiroots.
Results: 7,222 files classified across 18 wikiroots.
| Wikiroot | Files | Delete | Move | Keep | Need LLM |
|---|---|---|---|---|---|
| legacy/bertball | 4,165 | 1,309 | 2,041 | 815 | 605 |
| settings/starwars | 878 | 103 | 548 | 227 | 126 |
| systems/system8 | 480 | 6 | 57 | 417 | 416 |
| systems/system7 | 313 | 9 | 12 | 292 | 267 |
| settings/oathkeep | 220 | 5 | 169 | 46 | 27 |
| systems/system5 | 208 | 0 | 3 | 205 | 205 |
| settings/bdh | 194 | 4 | 135 | 55 | 44 |
| systems/system6 | 165 | 0 | 10 | 155 | 144 |
| settings/faerun | 163 | 4 | 134 | 25 | 10 |
| legacy/inactive-systems | 155 | 0 | 10 | 145 | 143 |
| systems/5e | 118 | 1 | 11 | 106 | 106 |
| settings/homeworlds | 86 | 0 | 75 | 11 | 7 |
| legacy/colewiki | 34 | 3 | 2 | 29 | 28 |
| settings/dark-sun | 17 | 1 | 15 | 1 | 1 |
| settings/dieselpunk | 13 | 0 | 10 | 3 | 3 |
| settings/torvalt | 6 | 0 | 5 | 1 | 1 |
| settings/new | 4 | 0 | 0 | 4 | 4 |
| settings/eberron | 3 | 0 | 1 | 2 | 1 |
| TOTAL | 7,222 | 1,445 | 3,238 | 2,539 | 2,138 |
By content type: other 2,138 ยท lore 1,270 ยท redirect 674 ยท spell 654 ยท monster 619 ยท character 513 ยท index 401 ยท item 345 ยท location 339 ยท feat 56 ยท campaign-session 55 ยท rule 54 ยท class-feature 39 ยท campaign-landing 22 ยท spam 15 ยท organization 14 ยท empty 14
Delete breakdown: SRD-duplicate 742 ยท redirect 674 ยท spam 15 ยท empty 14
2,138 files need LLM/review (29.6% of total). Most are in systems/ wikiroots where content is homebrew rules without standard D&D patterns (system5-8, 5e) and legacy/bertball remainders.
- [x] 2a-1: Built heuristic classifier engine.
classify-wiki.jswith CLI:node classify-wiki.js --wikiroot <path> [--resume] [--llm] [--model name] [--report]. Output toclassify-results/{wikiroot}.json. - [x] 2a-2: Heuristic rules. 19 campaigns detected via link graph. Detects: empty, redirect, spam, SRD content, campaign membership (link graph + "Part of Saga:" markers), character infobox, spell stat-block, feat stat-block, monster stat-block, domain, class features, session logs, character stat blocks, location (geo markers + building type patterns + ward/sabban patterns), organization, item (equipment/artifact keywords), index pages (link-heavy), category inference, generic lore (substantial prose + known setting).
- [x] 2a-3: Destination routing. Computed
proposed_destfor legacy/bertball files. Other wikiroots deferred to Phase 2e (in-place restructure).
Phase 2b: LLM Classification (Ollama)
For files not confidently classified by heuristics (~2,500-3,700 files). Requires Ollama installed locally.
- [x] 2b-1: Install Ollama and pull model. Installed Ollama v0.13.5 on Ubuntu WSL. Pulled
llama3.1:8b(4.9GB, Q4_K_M). No GPU detected in WSL โ running CPU-only at ~2-3s/file. - [x] 2b-2: Add LLM pass to classify-wiki.js. Added ~560 lines to classify-wiki.js (lines 1002-1564):
ollamaChat()โ calls Ollama/api/chatwith JSON mode, 2-min timeout, 2 retriesbuildClassifyPrompt()โ sends filename, existing classification, first 2000 chars, valid enum valuesbuildSRDHombrewPrompt()โ targeted SRD homebrew detection promptmergeLLMResult()โ merges LLM output with heuristic, respecting confidence levelsrunLLMPass()โ iterates candidates, progress bar with ETA, persists every 5 files (resumable)runSRDHomebrewCheck()โ re-checks SRD-tagged files, reclassifies homebrew-modified onesmarkLowConfidenceForReview()โ marks confidence < 0.5 asproposed_action: "review"- CLI flags:
--llm,--srd-homebrew,--model <name>,--ollama-url <url>,--resume
- [x] 2b-3: Handle low-confidence results. Files where LLM confidence < 0.5 get
proposed_action: "review"andtags: ["needs-review"]. - [x] 2b-4: Handle SRD-tagged files with potential homebrew. Implemented
runSRDHomebrewCheck(). Tested on 24/742 bertball SRD files: 8 reclassified as homebrew-modified, 12 confirmed pure SRD. Full run in progress. - [ ] 2b-5: Run classification on all target wikiroots. Created
run-llm-classification.shto run all 16 wikiroots sequentially. Running in background (bash run-llm-classification.sh --srd-homebrew). Tested successfully on dark-sun (19 files, 0 errors, 73s) and eberron (3 files, 0 errors, 5s). Full run estimated ~2-3 hours on CPU.- [x]
node classify-wiki.js --wikiroot settings/dark-sun --llm(20 files, โ complete โ all high confidence) - [x]
node classify-wiki.js --wikiroot settings/eberron --llm(3 files, โ complete โ all high confidence) - [*]
node classify-wiki.js --wikiroot legacy/bertball --llm --srd-homebrew(~4,165 files, running...) - [*]
node classify-wiki.js --wikiroot settings/starwars --llm(~878 files) - [*]
node classify-wiki.js --wikiroot systems/system7 --llm(~313 files) - [*]
node classify-wiki.js --wikiroot settings/oathkeep --llm(~220 files) - [*]
node classify-wiki.js --wikiroot settings/bdh --llm(194 files) - [*]
node classify-wiki.js --wikiroot systems/system6 --llm(~165 files) - [*]
node classify-wiki.js --wikiroot settings/faerun --llm(163 files) - [*]
node classify-wiki.js --wikiroot settings/homeworlds --llm(~86 files) - [*] Remaining: legacy/colewiki, legacy/inactive-systems, systems/system8, systems/system5, systems/5e, settings/dieselpunk, settings/torvalt, settings/new
- [x]
Phase 2c: Report Generation & Human Review
- [x] 2c-1: Build
classify-report.js. Readsclassify-results/{wikiroot}.jsonand generates an HTML report with:- Section 1: All files alphabetically โ content type, tags, categories, setting, campaign, confidence, proposed action, link to .md file
- Section 2: Same list grouped by content_type
- Section 3: Proposed changes as an execution plan โ "Delete N files (list), Move N files (source โ dest list)"
- Section 4: Low-confidence files needing manual review
- Filterable/sortable table (use a lightweight JS table library or inline
<script>with sort/filter) - Serve via
npx serve classify-results/or similar โ links to .md files should work
- [x] 2c-2: Generate reports for each wikiroot.
- [x] 2c-3: Human review. Walk through the report. Edit the JSON directly to override any misclassifications. Re-run report to verify.
Phase 2d: Execute Migration
- [ ] 2d-1: Build
execute-migration.js. Reads the approvedclassify-results/{wikiroot}.jsonand:- Deletes files marked
delete: true - Moves files to
proposed_dest, creating directories as needed - Updates all internal links in moved files (relative path recalculation)
- Updates all links in OTHER files that pointed to moved files (search all .md files for references)
- Handles macOS case-insensitive filesystem (two-step rename via temp file)
- Dry run by default,
--writeto execute - Summary report: N deleted, N moved, N links updated
- Deletes files marked
- [ ] 2d-2: Run on legacy/bertball first (biggest, most impactful).
- [ ] 2d-3: Run on remaining wikiroots (starwars, system7, oathkeep, etc.).
- [ ] 2d-4: Verify with check-broken-links.js โ re-run broken link analysis to confirm no regressions.
Phase 2e: Non-Bertball Folder Restructuring
For flat wikiroots that stay in place but need internal folder structure (starwars, oathkeep, system7, system6, etc.).
- [ ] 2e-1: Extend execute-migration.js for in-place restructuring. Same move logic, but destination is within the same wikiroot (e.g.,
settings/starwars/darth-maul.mdโsettings/starwars/people/darth-maul.md). - [ ] 2e-2: Setting wikis โ create /people, /places, /things, /time, /campaigns subfolders per the Folder Structure Standards.
- [ ] 2e-3: System wikis โ create /character, /enemies, /system subfolders per the Folder Structure Standards.
- [ ] 2e-4: Verify all restructured wikiroots with broken link analysis.
Hardware & Time Estimates
| Hardware | Model | Speed (per file) | LLM pass (~2,500 files) | Total with all wikiroots |
|---|---|---|---|---|
| Apple Silicon (M-series) | llama3.1:8b | ~3-5 sec | ~2.5-3.5 hours | ~4-5 hours |
| Core 7 Ultra + 5070 Ti | llama3.1:8b | ~1-1.5 sec | ~45-60 min | ~1.5-2 hours |
| Core 7 Ultra + 5070 Ti | qwen2.5:14b | ~2-3 sec | ~1.5-2 hours | ~2.5-3 hours |
The 5070 Ti's 16GB VRAM and CUDA acceleration roughly halves inference time vs Apple Silicon Metal, and also allows running a 14B model at 8B speeds for better classification accuracy. Heuristic pass is instant on any hardware. Recommendation: use the 5070 Ti with qwen2.5:14b for the best accuracy/speed balance.
Bertball Wiki
- [ ] For each document in legacy/bertball, analyze the content and build a map (in a file) as follows:
- Determine the document's content type, such as:
- A game mechanic copied from a system, such as a page describing the feat Power Attack, or a page describing the spell Magic Missile.
- A character page for a single character
- A campaign landing page
- A page describing a campaign session, episode, or story arc / season
- A page describing a location
- A page describing an organization
- A nonsense page or spam post from back when the wiki was public and open to editing
- Any other content tags you can identify
- Determine the document's category, such as:
- If it's part of a campaign, which campaign does it belong to
- If it's part of a setting, which setting does it belong to (cheat code: for legacy/bertball, the answer is almost always 'faerun' or 'bdh')
- If it's part of a system, which system does it belong to (likeliest answers: 3.5e, System 6, and System 7)
- For each file, mark clearly if it is flagged for deletion (see below for criteria)
- Output an HTML page as follows:
- Section 1: list of all documents alphabetically, with their content type, tags, categories, and a link to the document
- Section 2: same list, but grouped by content type
- Section 3: a list of proposed changes, following the criteria for the actual execution described below. For example: "delete these files, move these files here, etc"
- When complete, serve the html page in a root sufficient for the document links to work. Doesn't have to be ursa, it's fine to link to the .md files directly.
- Determine the document's content type, such as:
- [ ] Human Review of the output html
- [ ] When approved, execute the migration:
- [ ] delete all nonsense and spam posts
- [ ] delete all system information from non-homebrew systems (such as 3rd and 5th edition D&D).
- [ ] Move setting documents to the appropriate setting folder if it exists
- [ ] move campaign documents to the appropriate campaign folder (which should be in {setting name}/{campaign id}, create as needed), updating links as necessary.
- [ ] move character documents to the appropriate campaign's characters folder (if the character's campaign was detected), updating links as necessary
- [ ] move system information to the appropriate system folder. Put these files in a newly created "incoming" subfolder with subfolders as needed; human intervention will be needed to sort through these.
Non-Bertball
- [ ] For each folderless wiki besides legacy/bertball (where all documents are in the root, as with Mediawiki), analyze the content the same way as described above, build a similar map. The main difference is that nothing is being moved out of each of these wikiroots. The map is used to build a folder structure. Thus, the execution plan:
- [ ] Setting wiki:
- [ ] Create /people, /places, /things, and /time folders. Move documents into the appropriate folder based on their content (e.g. a document describing a city goes in /places/cities, a document describing a religion goes in /people/religions, etc.). Update links as necessary.
- [ ] Create a /campaigns folder. For each campaign that takes place in this setting, create a folder for that campaign (named with the campaign ID), move all documents related to that campaign into that folder, and update links as necessary.
- [ ] System wiki:
- [ ] Create /character, /enemies, and /system folders with appropriate subfolders (see Folder Structure Standards section of this document). Move documents accordingly, and update links as necessary.
- [ ] Campaign wiki (not sure there are any matching this criteria, but just in case):
- [ ] After finishing restructuring all setting wikis, find all campaigns.
- [ ] For each campaign, create a folder for that campaign (named with the campaign ID) in the /campaigns folder of the appropriate setting, move all documents related to that campaign into that folder, and update links as necessary.
- [ ] Setting wiki:
Phase 3: Folder Restructuring
- [ ] Within each project, restructure folders to match the new standards (e.g. move character documents into a "characters" folder, etc.), and update links as necessary
Bonus Phase: Image Extension Repair
- [ ] A lot of Aurora wikis have extensionless images. For each extensionless image, read the image to determine its correct extension, rename the image, and update all links to that image accordingly.