Last Thursday at London.PM, I got asked a lot why MediaWiki wikitext doesn’t have a WYSIWYG editor. The answer is that a WYSIWYG editor would need to know wikitext grammar, and there is no defined grammar. The MediaWiki “parser” is not actually a parser — it’s a twisty series of regular expressions (PHP’s version of PCREs).
So any grammar effort (and several What You See Is All You Get editors — others just forget wikitext and write HTML) requires reverse-engineering that, and lots of people have tried and gotten 90% of the way before stalling. It doesn’t help that wikitext is (I’m told) provably impossible to just put into a single lump of EBNF.
The goal is to replace the twisty series of regexps with something generated from a grammar. Tim Starling has said, more or less: “We can’t change wikitext. Go away and write something that (a) covers almost all of it (b) is comparably fast in PHP.” Harsh, but fair.
It occurred to me that there must exist tools to convert regexps into EBNF. And that if we can get it into even a few disparate lumps of hideous EBNF, there should be tools to take those and simplify them somewhat. (Presumably with steps to say what given bits mean.) Or possibly things other than EBNF, just as long as the result is parseable.
I am not (even slightly) a computer scientist, but many of you are. Does anyone have any ideas on this? Or pointers to anyone having done anything even remotely similar? Or knowledgeable friends they could point this query at?