The MediaWiki wikitext parser is not a “parser” as such; it’s a pile of regular expressions, using PCRE as found in PHP. There are preprocessing and postprocessing steps. No formal definition of wikitext exists; the definition is literally “whatever the parser does.” Lots of features of wikitext that people use in practice are actually quirks of the implementation.
This is a serious problem. Rendering a complex page on en:wp can take several seconds on the reasonably fast WMF servers. Third-party processing of wikitext into XML, HTML or other formats is not reliably possible. You can’t drop in a faster parser if you happen to have access to gcc on your server. Solid WYSIWYG editing, as opposed to the many approximations over the years (some very good, but still very approximate), could really do with a formally-described language to work to. (That’s not all it needs, but it’s pretty much needed to make it solid.)
Actually describing wikitext is something many people have attempted and ended up dashing their brains against the rocks of. The hard stuff is the last 5%, and almost all of the horrible stuff needs to work because it’s used in the vast existing body of wikitext. Wikitext is provably impossible to describe as EBNF. Steve Bennett tried ANTLR and that effort failed too.
If you’ve ever spat and cursed at the MediaWiki parser, you may care to glance at this month’s wikitext-l archives. (That’s the list
Tim Starling Domas Mituzas created to keep us from clogging wikitech-l with gibbering insanity.) Andreas Jonsson has been having a good hack at it, and he thinks he’s cracked it.
This won’t become the parser without some serious compatibility testing … and being faster than the existing one. But this even existing will mean third parties can use a compiled C parser instead of PHP, third parties can process wikitext with blithe abandon without a magic black box MediaWiki installation, dogs and cats can live together in Californian gay marriage and the world will be just that little bit more beautiful. Andreas’ mortal shell, mind destroyed by contemplation of insanity beyond the power of the fragile human frame to take, would be in line for the Nobel Prize for Wikipedia. Could be good. Should be in the WMF Subversion within a few days.