Even the Free Software Foundation doesn’t understand the GFDL.

Has anyone ever gotten a straight answer from licensing@fsf.org about GFDL queries? I have never even heard of an answer from them that isn’t their Magic 8-Ball imitation. “Reply hazy, read the license text and ask your own lawyer.” Our lawyer is Mike Godwin and he says it makes his head hurt. YOU WROTE THE DAMN THING. WHAT DID YOU MEAN? WHAT WERE YOU THINKING? ANSWER ME!

In fairness, the FSF contact page says licensing@fsf.org will help with “questions about the GPL and free software licensing.” Even the FSF has given up trying to make sense of the GFDL. The new version can’t happen soon enough.

(Provoked by asking for help with the reuse FAQ and the likely utter unfeasibility of audio versions of GFDL text. The latter is one of the best arguments I can think of for running screaming to CC-by-sa as absolutely soon as possible and throwing the GFDL into a fire.)

Regular expressions to EBNF?

Last Thursday at London.PM, I got asked a lot why MediaWiki wikitext doesn’t have a WYSIWYG editor. The answer is that a WYSIWYG editor would need to know wikitext grammar, and there is no defined grammar. The MediaWiki “parser” is not actually a parser — it’s a twisty series of regular expressions (PHP’s version of PCREs).

So any grammar effort (and several What You See Is All You Get editors — others just forget wikitext and write HTML) requires reverse-engineering that, and lots of people have tried and gotten 90% of the way before stalling. It doesn’t help that wikitext is (I’m told) provably impossible to just put into a single lump of EBNF.

The goal is to replace the twisty series of regexps with something generated from a grammar. Tim Starling has said, more or less: “We can’t change wikitext. Go away and write something that (a) covers almost all of it (b) is comparably fast in PHP.” Harsh, but fair.

It occurred to me that there must exist tools to convert regexps into EBNF. And that if we can get it into even a few disparate lumps of hideous EBNF, there should be tools to take those and simplify them somewhat. (Presumably with steps to say what given bits mean.) Or possibly things other than EBNF, just as long as the result is parseable.

I am not (even slightly) a computer scientist, but many of you are. Does anyone have any ideas on this? Or pointers to anyone having done anything even remotely similar? Or knowledgeable friends they could point this query at?

The other approach is parserTests.php. Running maintenance scripts, the scripts (look for parserTests), the list of tests. A “parser” will be anything that passes the unit tests.