In which I remember that writing on Wikipedia is fun.

Casual meet with some Wikimedia UK people last night. As is standard whenever two or three Wikimedians meet, we conspired to fix everything that’s wrong with Wikipedia and Wikimedia! Perhaps 1% will ever be attempted. One project that desperately needs participants is Wikisource, particularly so that it can handle incoming documents from museums and archives — Commons is doing okay with the images.

Last week I took The Wolfgang Press from two paragraphs to a proper article. Yesterday it was on the front page Did you know? section for six hours and got 3065 hits, compared to its usual 25-50. Not bad for an article about an obscure band approximately no-one cares about. (“Kansas”, from Bird Wood Cage, is a lost goth rock classic. DJs, please play. Thank you.) I’d forgotten how much quiet nerdy fun it is writing and researching a Wikipedia article. Writing about anything artistic on Wikipedia is arse, though, unless you can find critics to quote. Printed ones by preference. Wikipedia’s epistemology is severely broken at the edges, and knowing how it got that way doesn’t actually help.

The FSF is still pushing the GFDL. Ignore them.

The Free Software Foundation is recommending what licenses to use for your software and documentation projects. They have recommended the GNU Free Documentation License as a dandy thing that is just marvellous for your project. They are dangerously and stupidly wrong.

tl;dr: Reusing GFDL content is unworkably unclear. When you ask the FSF about the tricky bits, they say “better get a lawyer, son.” This egregiously misses the entire damn point. Freedom only for those with copyright lawyers on tap is not freedom.

The GFDL is possibly one of the worst free content licenses ever. The only reason it has not been justifiably buried at the bottom of a swamp with “MISERABLE FAILURE” burnt into its forehead with a soldering iron is because Wikipedia used to use it. Wikipedia used it because Nupedia used it. Nupedia used it only because the CC licenses hadn’t been invented yet.

Literally no one understands how to reuse GFDL content safely, including the FSF. I sent a query about how it applies to aggregates; three months later, the FSF cut’n’pasted their “we have no idea either” response, suggesting you read the license text and consult your attorney. Given Wikimedia’s attorney at the time was Mike Godwin and it made his head hurt too, this strongly suggests no-one left at the FSF wants to think about this thing. (They finally clarified this particular issue in GFDL 1.3.)

In the context of mirroring a widely-edited wiki or a page thereof, its terms are difficult to follow, legally unclear and may be technically impossible to comply with in a comparable degree of safety to the GPL or CC by-sa. (Every copy must have the full 23 kilobytes of licence text attached, about seven pages of single-spaced 12 point. This is not so good for single articles or photographs, and Internet video is likely impossible to reproduce under the GFDL in legal safety. This is, of course, the easiest term of the GFDL to obey. CC by-sa allows the license to be named by reference.) And don’t even ask about images.

Before Wikimedia went Creative Commons, tedious nerds would frequently claim that any given reuser of Wikipedia content was technically violating the GFDL no matter what shrubberies they obtained (thus putting off quite a lot of reusers). And no-one, particularly not the FSF, could provide meaningful advice that could actually be relied upon, because the licence is deeply and fundamentally not fit for purpose.

The GFDL is so unremittingly awful, hard to trust legally and onerous in practice that the Debian project went so far as to throw its hands in the air and declare that it failed the Debian Free Software Guidelines. (The DFSG threat model is “What if an insane or malicious copyright holder comes after a reuser?” This is the question the FSF answers “Better get a lawyer, son.” THIS IS NOT HOW TO BE REUSABLE. The FSF does not answer this concerning the GPL, for instance.) So if you want your open source package in Debian, you probably don’t want the documentation under GFDL.

Use CC by-sa, CC-by or Public Domain, like approximately everyone else in the whole goddamn free content world does. If it’s a software manual, licence or dual-licence it under the same licence as the software itself.

You know how some people use shitty handrolled software licenses that are technically free or open but aren’t actually compatible with anything else, and everyone else thinks they’re dicks for it and avoids using their stuff? That’s why you shouldn’t use the GFDL either.

The New York Times hands the first draft of history over to the BBC and Guardian.

Not that the paywall hasn’t been a while in the cooking. Noam Cohen from July 2009:

So, in essence, many Wikipedia articles are another way that the work of news publications is quickly condensed and reused without compensation.

I’m sure he’s most pleased that sort of thing will stop and the New York Times will be cited about as often as The Times is these days.

Every journalist I’ve spoken to since 2006 uses Wikipedia as their handy universal backgrounder. Funnily enough, there’s a distinct lack of donations to the Wikimedia Foundation from newspapers and media organisations. How much did the New York Times donate in the fundraiser?

We do this stuff for everyone to use and reuse. Journalists taking full advantage of this is absolutely fine. But claiming we should then pay the papers for the privilege is just a little odious.

I wish the New York Times the full fruits of this bold move.

The secret of motivating volunteers.

Motivating volunteers is like herding cats. “Herding cats is easy if you know the local value of tuna.” — me, some years ago. An observation I know of no-one else having made before me, so I’m taking this as my law of volunteer motivation. Lure them with something compelling.

My experience of volunteer motivation:

  1. Running an indie rock fanzine in the 1980s and 1990s. Your writers are writing for free; how do you keep them pumping out good stuff? How do you gently let down the really shit ones?
  2. Employed as paperwork administrator for a private school fundraising drive. That was easy because the volunteers had a very specific task: get the people on the donation level below them to donate, knowing the volunteer asking them had donated more than they were asking for. (Work that Pareto principle! This method is so effective it’s pretty much standard.)
  3. Editor and then supervisor on a student newspaper. Student papers pretty much exist to be written rather than read; the idea is to get fresh new university students doing something that they can look at and say “I made this!” when they’ve possibly never done anything, ever before. You need a certain amount of discipline to the task.
  4. Rocknerd, which is pretty much an indie rock fanzine again, but on the web. In this case, you give opinionated people somewhere to write stuff. I’ve almost no-one else bothering, because everyone’s got a blog of their own.
  5. Wikipedia. This is very different because your volunteers are taking up internet space rather than physical space. Also, on Wikipedia the volunteers are … sometimes a bit socially odd. The idea of organising all the facts in the world attracts people from further up the autistic spectrum than you would normally encounter, so some people’s social skills can be brittle. The authority thing is strictly meritocratic — being an admin (which really means “janitor”) or even an arbitrator (which really means “bouncer”) doesn’t cut it; you have to show consistent cluefulness to convince; there is no actual hierarchy. So to get stuff done you have to get very good at convincing people to work on what you want them to.

I’m also slightly amazed at projects like the Puffing Billy tourist railway in Melbourne. Even all the shitwork is done by volunteers, the sort of shitty jobs that turn a paid employee doing them to thoughts of socialist revolution — someone loves the railway so much they get up at three every morning to start the boiler heating! A lot of the tour guides are young folk doing it as work experience (tourism degrees and so forth) or retirees who like to do something for the general good that gets them out and meeting people.

Volunteers will work ten times as hard as any employee, but only because they want to be there and only on things they want to. But that motivation is so fragile, and volunteer effort is not fungible.

BBC “5 Live Investigates” on Books LLC, Sunday night 9pm UTC.

BBC 5 Live Investigates is running a piece on Sunday 9pm (this item likely to go out 9:45pm or so) on Books LLC and similar operations, which sell reprints of Wikipedia articles as print-on-demand books on Amazon.

The researcher called a few UK people for the Wikipedian viewpoint. As it turns out, we have one — more than a few Wikipedians have bought these things, thinking they were hitherto-unknown new printed sources to use, only to discover their own words on the topic! At prices like $50 for a 10,000 word pamphlet, this is a most unpleasant surprise. It’s also caught a number of slightly famous people who were surprised to find someone had “written” a book about them.

The casual reader encountering these things may not be aware of the business model. These are print-on-demand books, compiled by computer from a list of keywords. No copies exist until someone orders one, at which point a single copy is printed and sent. People aren’t generally aware that POD is very good quality these days — you can send a PDF to a machine and have it spit out an absolutely beautiful perfect-bound book for you, of a standard which previously would have been quite pricey. So Books LLC and Alphascript and whoever manage to eke out a tiny profit on single copies, having worked out a way to spam Amazon.

The books are entirely legal — you can use our stuff without permission, even commercially, and “Please, use our stuff!” is why quite a lot of us do this at all. And we have a link on each wiki to make your own PDF book, and various projects have partnerships with printers like PediaPress. So the main issue is that it’s not being made clear that these books are just Wikipedia reprints.

I tried to stay strictly descriptive of consensus, but I think I could clearly say that we would very much like the publishers and Amazon to make it clearer just what these things are. Thanks.

The problem is there’s no direct action we can really take without hampering the good reasons for reuse of our material. Or scaring people off entirely — it’s hard enough getting across the idea of freely reusable content as it is. We can use publicity about this to spread awareness that we’re all about reusing our stuff, as we introduce civilisation to the notion of reusability as being the normal order of things.

thewub points out on foundation-l:

From March 1st it might be worth contacting the UK Advertising Standards Authority, as their remit is being extended then: http://asa.org.uk/Regulation-Explained/Online-remit.aspx Amazon product descriptions almost certainly fall under “non-paid-for space online under [the marketer’s] control”. So a misleading description ought to lead to action. But the issue here is the misleading *lack* of any description. It could be an interesting conundrum for the ASA!

The problem will largely solve itself: if physical copies of Wikipedia articles ever gained any actual popularity, competition would kick in very fast. Even competition on quality would fail, as people sought to design beautiful editions just because they could.

But they won’t. Is the essence of a book the informational content? Is the essence of a book a lump of tree pulp? Is the essence of a book the ideal synergy of the two, creating an object of beauty, wonder and love? The answer people seem to be picking is the first. The Kindle may be a hideously locked-down proprietary money trap, but it’s really quite lovely as a book reader. I read books as PDFs on my netbook, hardly ever picking up my paper copies. A printed general encyclopedia is now a ludicrous idea. “It’s from a printed book!” will soon be as relevant a criterion to sourcing as “It’s on a website!”

Anything made of atoms is a white elephant. I have a four hundred kilogram vinyl record albatross. I will never rip these things, having had a turntable four years and ripped none. Music is digital. Books are digital. Stuff is a curse.

(In that essay, Paul Graham explicitly excludes books from being counted as mere “stuff.” He is wrong.)

Update: Books, LLC responds. They claim Amazon removed Wikipedia links from the listings.

Single point of failure.

Monopoly wasn’t a goal for Wikipedia, it’s something that just happened.

There’s basically no way at this stage for someone to be a better Wikipedia than Wikipedia. Anyone else wanting to do a wiki of educational information has to either (a) vary from Wikipedia in coverage (e.g., be strongly specialised — a good Wikia does this superlatively) (b) vary from Wikipedia in rules (e.g., not neutral, or allow original research, like WikInfo) and/or (c) have a small bunch of people who want to do a general neutral encyclopedia that isn’t Wikipedia and who will happily persist because they want to (e.g., Knowino, Citizendium).

Competition would be good, and monopoly as the encyclopedia is not intrinsically a good thing. It’s actually quite a bad thing. It’s mostly a headache for us. Wikipedia wasn’t started with the aim of running a hugely popular website, whose popularity has gone beyond merely “famous”, beyond merely “mainstream”, to being part of the assumed background. We’re an institution now — part of the plumbing. This has made every day for the last eight years a very special “wtf” moment technically. It means we can’t run an encyclopedia out of Jimbo’s spare change any more and need to run fundraisers, to remind the world that this institution is actually a rather small-to-medium-sized charity.

(I think reaching this state was predictable. I said in 2005 that in ten years, the only encyclopedia would be Wikipedia or something directly derived from Wikipedia. I think this is the case, and I don’t think it’s necessarily a good thing.)

The next question is what to do about this. Deliberately crippling Wikipedia would be silly, of course. The only way Wikipedia will get itself any sort of viable competitor is by allowing itself to be blindsided. Fortunately, a proper blindsiding requires something that addresses structural defects of Wikipedia in such a way that others can use them.

(One idea that was mooted on the Citizendium forums: a general, neutral encyclopedia that is heavy on the data, using Semantic MediaWiki or similar. Some of the dreams of Wikidata would cover this — “infoboxes on steroids” at a minimum. Have we made any progress on a coherent wishlist for Wikidata?)

But encouraging the propagation of proper free content licences — which is somewhat more restrictive than what our most excellent friends at Creative Commons do, though they’re an ideal organisation to work with on it — directly helps our mission, for example. The big win would be to make proper free content licenses — preferably public domain, CC-by or CC-by-sa, as they’re the most common — the normal way to distribute educational and academic materials. Because that would fulfill the Foundation mission statement:

“Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.”

— without us having to do every bit of it. And really, that mission statement cannot be attained unless we make free content normal and expected, and everyone else joins in.

We need to encourage everyone else to take on the goal of our mission with their own educational, scientific and academic materials. We can’t change the world all on our own.

So. How would you compete with Wikipedia? Answers should account for the failings of previous attempts. Proposals involving new technical functionality should include a link to the code, not a suggestion that someone else should write it.

Anyone who advocates advertising on Wikipedia is a drooling moron.

I used to be a big fan of ads on Wikipedia. I changed my mind a while ago, and had my opinion confirmed by what Google did to TVTropes just a couple of months ago.

I wonder what gay and lesbian employees of Google think of this. I haven’t heard one breathe a peep over the fact that any TVTropes page with the slightest gayness is behind a filter. Censorship, it’s insidious.

But! That’s okay! You’ve all got mortgages.

TVTROPES IS THE UNIVERSAL COUNTEREXAMPLE. YOU CANNOT ADVOCATE ADS ON WIKIPEDIA WITHOUT A KILLER CASE FOR WHY THIS WOULD NOT HAPPEN TO US.

Wikipedia having ads would be the worst possible move for the mission: “Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.”

No advertiser in existence would stand for that in practice.

What you see is FOR THE WIN.

I posted to foundation-l concerning the other way to get more people editing Wikipedia: the perennial wish for good WYSIWYG in MediaWiki.

This is a much bigger potential win than many people think. From mediawiki-l in May, a Canadian government official posted how adding a (locally patched) instance of FCKeditor to their intranet got eight times the participation:

In one government department where MediaWiki was installed we saw the active user base spike from about 1000 users to about 8000 users within a month of having enabled FCKeditor. FCKeditor definitely has it’s warts, but it very closely matches the experience non-technical people have gotten used to while using Word or WordPerfect. Leveraging skills people already have cuts down on training costs and allows them to be productive almost immediately.

The geeks refused to believe that not requiring people to wade through computer guacamole worked and that everyone new must be idiots. The poster disabused them of this conceit:

Since a plethora of intelligent people with no desire to learn WikiCode can now add content, the quality of posts has been in line with the adoption of wiki use by these people. Thus one would say it has gone up.

In the beginning there were some hard core users that learned WikiCode, for the most part they have indicated that when the WYSIWYG fails, they are able to switch to WikiCode mode to address the problem. This usually occurs with complex table nesting which is something that few of the users do anyways. Most document layouts are kept simple.

Eight times the number of smart and knowledgeable people who just happen to be bad with computers suddenly being able to even fix typos on material they care about. Would that be good or bad for the encyclopedia?

Now, WYSIWYG has been on the wishlist approximately forever. Developer brilliance applied to the problem has dashed hopes on the rocks every single time. Brilliance is not enough: we’re going to need to apply money.

  • We need good WYSIWYG. The government example suggests that a simple word-processor-like interface would be enough to give tremendous results. So that’s an achievable target.
  • It’s going to cost money in programming the WYSIWYG.
  • It’s going to cost money in rationalising existing wikitext so that the most unfeasible formations can be shunted off to legacy for chewing on.
  • It’s going to cost money in usability testing. Engineers and developers are perpetually shocked at what ordinary people make of their creations.
  • It’s going to cost money for all sorts of things I haven’t even thought of yet.

This is a problem that would pay off hugely to solve, and that will take actual money thrown at it.

How would you attack this problem, given actual resources for grunt work? What else could do with money spent on it?

Magnus Manske, in his usual manner, has coded up a quick editor whose name I’ve stolen for this post. It’s rough, but it’s a nice working example of some of the way there. WYSIFTW. Screenshot.

How does a project bite only the proper number of newbies?

I found a year-old draft of this post, but I think it’s apposite again — given that in a recent discussion of “how to attract more editors?”, Tim Starling seriously posited that we need to repel newcomers because the community (specifically the admin culture) is too toxic to throw them at.

Angie Byron did a presentation on getting women into open source which David Eaves spoke further on.

Summary: With open source software, there are people who think “that’s dumb,” there are people who think “I want to see it fixed” and there are people who think “I can do something about it.” The people at the intersection of all three power open source.

On Wikipedia, the intersection point – marked “These people power open source” – is pretty much anyone who knows something and can write it in a coherent sentence. But social structures have evolved to keep it from turning into complete rubbish. Sometimes way too much structure. This means there are all manner of social mechanisms to repel clueless n00bs, since there aren’t the technical or thinking-style barriers there are to coding. And many of these are (IMO) inappropriately strong.

How not to bite the n00bs is a perennial topic on Wikipedia, and I’m currently trying to get some old hands to edit as IPs so they can see how n00b-biting we actually are in practice. Results are disheartening so far.

Some open source projects do similar things to deal with the “Linus doesn’t scale” problem when they attract large volumes of people who can at least write code to a minimal degree. A high quality requirement on code is ony the start — there’s getting attention to your code, jumping through the right hoops, dealing with obnoxious-nerd-stereotype personalities (there’s a reason it’s a stereotype), etc.

But then the problem is what to do when you have more of them than you can deal with — the “Linus doesn’t scale” problem. It’d be nice if the only thing keeping them out was code quality, but it’s silly to claim that’s all there is.

Wikipedia has much the same problem — that intersection, which is tiny for software, is pretty much everyone who knows something and can type for a wiki. So hideously complex social structures have evolved to deal with the distributed crapflood. Many of which are way less than ideal and n00b-bite way too much.

That is: our problem is not getting people into that intersection point — it’s what happens to people in that intersection point, how to keep them from flooding you and how to make sure those mechanisms aren’t in fact damaging your project.

(It’s amazing how much time community nuts’n’bolts uses and how little sense one can have of things actually pushing forward. Wikipedia is the size of a small city. You know how hard it is herding five volunteers? Try getting ten thousand to do any particular thing.)

Jimmy Wales facts.

  • Jimmy Wales does not sleep. He waits … for your money.
  • Jimmy Wales’ tears cure lack of funding. Too bad he only cries in November and December.
  • There is no chin behind Jimmy Wales’ beard. There is only another “donate” button.
  • Jimmy Wales doesn’t do pushups — he gets your donation.
  • When the Bogeyman goes to sleep every night, he clutches the printout of his Wikipedia donation receipt as he checks his closet for Jimmy Wales.
  • Jimmy Wales does not go hunting donations, because the word “hunting” implies the possibility of failure. Jimmy Wales goes collecting donations.
  • Jimmy Wales sold his soul to the devil for his rugged good looks and unparalleled fundraising ability. When the deal had been done, Jimmy looked the devil lovingly in the eye and got his soul donated back.
  • Jimmy Wales not only gave Objectivists a nice reputation, he got them to donate to his charity out of enlightened self-interest.
  • The Wikimedia Foundation can build a funding drive out of paper clips, rubber bands and soda cans. And does so, every year.
  • If you can see Jimmy Wales, he can see you. If you can’t see Jimmy Wales, you may be only seconds away from donating.

We’re going to need more sharks.

When an open source project regards talk of forking as ”treason”[img], rather than as the defining characteristic of freedom, that’s a sign that it’s a dead project walking.

The last big example I can think of was XFree86 versus Xorg. XFree86 was all but stalled, with Linux vendors having to maintain huge patches themselves because the main project was so slow to accept changes. When Keith Packard, who’s personally driven X for twenty years, finally said “enough” and started organising Xorg, they expelled him.

Compare with Wayland, the new display manager to replace X in Ubuntu and Fedora. (Not a code fork, but in practical terms a developer effort fork.) The three people pushing for Wayland to replace X in Fedora are all Xorg lead developers. “Traitors”? No, people who have the actual aim in mind: making good open source display software.

Xorg remains alive and well even as several lead devs work on its replacement. XFree86 appears to have been abandoned, with no release in two years and no commits since February 2009.

Wikipedia has had any number of forks. Fred Bauder has been with Wikimedia since it was wikipedia.com — his Wikinfo fork has not led to him being regarded as a “traitor” in any way, he’s as highly respected as ever. Wikipedians have always had great interest in its forks and wished them well, including Citizendium. The community regards the forks as family, not enemies. We’re all on the same side: free educational content.

And not to mention that the project crying “treason!” at the word “fork” … started as a fork.

You can’t keep your project together with paranoia. There is no Iron Curtain around an open source project.

(There’s much wackiness around Citizendium at present. I haven’t edited there in three years, but Matt Innis has taken care to block me anyway for writing about them on RationalWiki. Gosh, that’ll sure show me! The point being, of course, internal signalling rather than anything that would affect me at all. “I will not answer any more questions and will ask the Constabulary to delete all discussions that in my view require open debate which is being suppressed here.” You can get running updates and discussion at RationalWiki. Bring your own popcorn.)

Dead institutions walking.

This reads like a radical anti-egalitarian manifesto by some young, smart but inexperienced Internet-based firebrand. Wikipedia is way cool! Universities are dead institutions walking! We’ll all learn off the web! Social networks will replace campuses! You know the sort of thing.

Then I got to the end and my jaw dropped when I saw what the author does for a living. Try to read the article without skipping to the end, scrolling down carefully without jumping ahead, it’s worth it.

So. What do we do to distinguish experts from non-experts when we no longer even have credentials as a marker of expertise? (e.g. there’s not a vast reserve of commercial positions for pure philosophers.)

I wonder if Lord Browne has heard about this yet.

What to do about the UK museum cuts?

The UK government is flat broke, so the axe is out and the Department of Culture, Arts and Leisure is a huge sitting duck.

Museums live on the ragged edge already. As Wikimedians, we need to do what we can to mitigate the disaster coming their way.

First thing off the top of my head: get our GLAM contacts in order and ask them:

“We can’t do political lobbying. But what can we do to help?”

It’s reasonably clear that this is an ambit claim to see who kicks up a fuss — the BBC discovered people love 6 Music, and that their desired Asian Network demographic preferred 6 Music.

It strikes me as feasible that a fuss in the general name of the arts is reasonably within WMUK’s charitable objectives and won’t violate anyone’s expectations of neutrality. It’ll also be powerful signaling that Wikimedia are one of the few native Internet groups to actively work for the preservation of culture. (Much as, as Geni has noted, we’re the only web 2.0 site to give a hoot about copyright.)

It will be worth remaining cognisant, of course, of the Iron Law of Institutions: we care about the collections themselves, the museum boards care about their power over the money per se and secondarily about what it’s spent on.

But the first step remains: offer them our help and ask what we, as huge fans of museums and what they do, can do to help.

What would you suggest as an effective course of action?

(See also discussion on wikimediauk-l.)

The wrung dry corpses of words.

I have been unduly cruel to Michel Houellebecq for cheap lulz. Appropriating dry technical texts is an entirely valid and often highly entertaining literary technique. And the publicity has actually made me want to read the book.

It was picked up on immediately because he picked a dry technical text that people actually read. I’m presuming here that words in French Wikipedia are subject to the same horrors they’re put through on English Wikipedia and the pattern of traumatised textual flesh is distinctive and obvious.

Spotting and marking for death anything interesting, well-written or showing signs of coherent authorship is, in practice, a reliable heuristic for eliminating puff pieces. English Wikipedia has a house style, and it’s really obvious when someone’s quoting a chunk of it. It’s what happens to text when too many people edit it and all nuance is iteratively wrung out. Also, there’s lots of dangling subclauses as successive writers argue in the article and try to get their favourite nuance or contingency covered. It’s most visible on articles that were made featured a few years ago and have since sunk into dilapidation.

If someone hasn’t studied and written about the Wikipedia house style academically, they damn well should. It would be an interesting exercise for an individual to try to write as badly as an overedited Wikipedia article, so as to make their own fake Wikipedia text for fiction. This may even be amenable to computerisation.

Staring into the eye of Cthulhu.

The MediaWiki wikitext parser is not a “parser” as such; it’s a pile of regular expressions, using PCRE as found in PHP. There are preprocessing and postprocessing steps. No formal definition of wikitext exists; the definition is literally “whatever the parser does.” Lots of features of wikitext that people use in practice are actually quirks of the implementation.

This is a serious problem. Rendering a complex page on en:wp can take several seconds on the reasonably fast WMF servers. Third-party processing of wikitext into XML, HTML or other formats is not reliably possible. You can’t drop in a faster parser if you happen to have access to gcc on your server. Solid WYSIWYG editing, as opposed to the many approximations over the years (some very good, but still very approximate), could really do with a formally-described language to work to. (That’s not all it needs, but it’s pretty much needed to make it solid.)

Actually describing wikitext is something many people have attempted and ended up dashing their brains against the rocks of. The hard stuff is the last 5%, and almost all of the horrible stuff needs to work because it’s used in the vast existing body of wikitext. Wikitext is provably impossible to describe as EBNF. Steve Bennett tried ANTLR and that effort failed too.

If you’ve ever spat and cursed at the MediaWiki parser, you may care to glance at this month’s wikitext-l archives. (That’s the list Tim Starling Domas Mituzas created to keep us from clogging wikitech-l with gibbering insanity.) Andreas Jonsson has been having a good hack at it, and he thinks he’s cracked it.

This won’t become the parser without some serious compatibility testing … and being faster than the existing one. But this even existing will mean third parties can use a compiled C parser instead of PHP, third parties can process wikitext with blithe abandon without a magic black box MediaWiki installation, dogs and cats can live together in Californian gay marriage and the world will be just that little bit more beautiful. Andreas’ mortal shell, mind destroyed by contemplation of insanity beyond the power of the fragile human frame to take, would be in line for the Nobel Prize for Wikipedia. Could be good. Should be in the WMF Subversion within a few days.

Update: Svn, explanation. Performance is actually comparable to the present parser. Not perfect as yet, but not bad.

We will add your activist distinctiveness to our own.

Despite the media attention, I don’t think this is any threat to the integrity of the encyclopedias’ content.

The Wikipedias get waves of activists and are used to dealing with them. The ones who don’t take the time to understand Neutral Point Of View, their stuff gets removed. The ones who do, their stuff stays and their cause gets accurately described and represented. Best case, we get more good new Wikipedians.

This applies to any activist for any cause whatsoever and has applied at least since I started on en:wp in 2004.

The advice I have for activists is: strict neutrality with excellent citations will do your cause justice. Everything else will be removed.

The broader advice is: there is no plausible attack on the integrity of the encyclopedias themselves that is not already something we are quite used to dealing with on a daily basis for many years.

I wonder if the presently prominent group of activists have taken in this one in the quest to have their stuff stick.

There’s a hole in my bucket.

Only a few calls after Afghan War Diary from people who think Wikileaks is part of Wikimedia. I must stress again the two are utterly unconnected, though I remain a big fan of Wikileaks.

What happens if the Pentagon manages to nail Julian Assange? Maybe, just maybe, Wikileaks posts the key to the file tagged “INSURANCE”.

In the meantime, US military are banned from looking at Wikileaks. I’m sure that’ll seal all leaks just fine. The Taliban can still read it, of course.

The old media aren’t happy either. I bet the RIAA wishes it had thought of calling in military strikes on Napster.

And to be on-topic: Wikileaks reveals US Army Intelligence cribs from Wikipedia, too. (Cache.)

Link pile.

Nazi Goatse, part 94.

Wikimedia has set up an investigation into the question of contentious content on the projects. Sexual content, violent content, pictures of Muhammad. The stuff that’s legal, but whose very existence offends people.

My sympathy goes out to the poor sods charged with the study. I’d be hard put to think of a more poisoned chalice. No matter what they come up with, they will be called Nazis and worse. And whatever they come up with will change no minds whatsoever and be hideously distorted — if they said “the best thing for Wikimedia is a goatse at the top of all pages,” someone would say “yes, and this is why anyone advocating images purporting to be Muhammad should be beheaded.”

The meta talk page has already been swooped upon by the usual participants and reduced to somewhat worse than uselessness.

I can reiterate my basic argument, as father of a three-year-old and stepfather of two teenagers.

The Wikimedia communities are sufficiently painstaking in making sure everything is educational and in context that I’d happily let my daughter in front of Wikimedia unrestricted. Anything sexual or horrifying would be informative and in context.

The community works incredibly hard to make the contentious stuff good. Any kid who looks up “fuck” on English Wikipedia will come away considerably educated, for example!

The last shock I got from Wikipedia was when I followed a link on another site to Cock ring, and was confronted with a large, shiny, erect penis. With, of course, a cock ring on it. Not something I’d care to have pop up on the screen at work … on the other hand, I have no reason to be going to an article on cock rings at work. I think the article was entirely reasonable and the use of the picture was entirely reasonable.

Then there is the issue of important photos of war and so on that are absolutely horrifying. They should be in the encyclopedia, even if merely describing some of them makes my stomach do flip-flops.

I think experience shows that the Wikimedia communities take their responsibility to educate seriously enough that “Wikipedia is not censored” is sufficient in practice. I have seen no cases that would lead me to think otherwise.

As noted in the most recent foundation-l reiteration of the Muhammad image discussion, Wikimedia has a firm bias to more information rather than less. It’s right there in the mission statement. Increasing, not decreasing, knowledge is why the community is here at all. If you go against the statement and expectation that more information is better than less information — even if the information is horrible and shocking — the community will not accept it. If the Foundation forces filtering on the community, the community will get up and leave. As Milos Rancic noted, implementing any of the recommendations on that meta talk page will promptly lead to a fork. As it should — insulting your community in such a manner is an excellent way to get rid of them.

Filtering should be left to third parties. The SOS Children Wikipedia for Schools is an excellent example, and it’s quite popular and won’t get a teacher fired. Other than that, I’ve seen no evidence of actual demand for a filtered Wikimedia from end users — only from people who want to filter the projects themselves at the source.

One perennial proposal is for images in given categories to be hidden from view for logged-in users. This is an idea I like, as it puts control in the hands of the viewer rather than third parties. All it requires is someone to code something that passes muster with Tim and Domas as unlikely to melt the servers.