Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This brings up a very interesting point - can anyone think of topics that Wikipedia has inadvertently omitted thanks to the narrow section of society that contributes significant amounts of new content to the site?

There's been a lot of talk in recent years about how the "initial work" of adding information to Wikipedia is mostly done, and that from here on out, it's going to be mainly about adding new content as it's created (new events, people, companies, etc.). But it seems possible that myopia on the part of editors could be having inadvertent effects.



> the "initial work" of adding information to Wikipedia is mostly done,

I don’t buy this at all. Almost every topic that I’m deeply familiar with has inadequate coverage on Wikipedia. (Examples: color science, latin american history and politics, cities in southern mexico, various mathematical topics (here the articles tend to be overwhelmingly technical and entirely lacking in context, motivation, or history), the history and function of many common household appliances, particular bits of human anatomy, computer user interface design and its history, typography, 17th–19th century political philosophers, various US Supreme Court cases.)

Every once in a while I try to tackle one of these, but writing a good encyclopedia article is still a ton of work.

Most topics have some kind of Wikipedia page, but only a tiny fraction are anywhere close to as comprehensive as they should be. Just consider: there are more than 20 million books in the Library of Congress, whereas if the English Wikipedia has about 2000 articles with “good article” or “featured article” status.


I agree with that, although I believe that prediction (I can't seem to dig up where it was) was more about how many new articles need to be created entirely, i.e. how many subjects Wikipedia should cover don't even have a 1-sentence stub yet. I think there are probably still several million of those, but new-article growth is inevitably slowing as low-hanging fruit has been quite well harvested.

I agree that there are a ton of the current 4 million articles that are nowhere near complete, in many cases not even 10% complete. That's an area that could use considerably more work than new-article creation imo.


It used to be even worse -- when I started using Wikipedia (dinosaurs walked the earth) they had substantially more written about several comic book characters than on obscure religious cults like e.g. Catholicism.

That said, long-term I bet on Wikipedia converging more on the desires of Wikipedians (who are so screamingly not representative of the population that it almost pains me to have to mention that) than on any objectively awesome target for Wikipedia. Happily, Wikipedian's consensus target for Wikipedia is, even if far from perfect, pretty close to one of the most useful tools on the Internet.


> long-term I bet on Wikipedia converging more on the desires of Wikipedians

The fundamental issue here seems to be that there's a select group of "Wikipedians" who are the primary editors. As long as the site doesn't make editing/adding articles a more user friendly/advertised feature, things aren't going to change.

Case in point: I went just yesterday to add some new content to an article, along with which I had a citation to add. Imagine my surprise when I found out that citations are still added in MediaWiki syntax! I can say without a doubt that every single acquaintance of mine who doesn't have an interest in computers (and quite a few who do) would have been utterly discouraged at this point from editing the article.

How difficult can it be to add a GUI editor? It would be a small, purely technical, step that could have a significant impact on the userbase, if advertised properly.


> The fundamental issue here seems to be that there's a select group of "Wikipedians" who are the primary editors.

It's actually not clear that's true when it comes to primary writers. There are a group of active editors (about 3000 at any given time) who do much of the maintenance work, e.g. putting articles into categories, formatting references, editing for style, moving stuff around, adding infoboxes, etc. But at least one study (I'll see if I can dig it up) found that a surprising percentage of the total content on Wikipedia is written by random anonymous IP addresses that show up to write a few paragraphs on a subject and are then never seen again.


I'd love to see that study. For some reason the idea of anonymous people writing bits and pieces, makes me like Wikipedia more.


Adding a good GUI editor properly integrated into Mediawiki would take at least a man year of effort by skilled programmers, I would guess. It would be a huge, extremely difficult technical step, with a large amount of resistance from the existing user base.

Most GUI text editors around the web are horrible buggy piles of crap (to take one example I was recently frustrated by, the editor at Adobe’s forums – and Adobe is a gigantic company with thousands of highly paid, experienced developers). The result of any such effort could easily end up making the software more complex and harder to use rather than easier.


> Adding a good GUI editor properly integrated into Mediawiki would take at least a man year of effort by skilled programmers, I would guess. It would be a huge, extremely difficult technical step, with a large amount of resistance from the existing user base.

As one of the most popular websites on the net, it's not like they don't have the resources to do it.

I think there are powerful members of the community who want to keep it somewhat insular and closed off as a defense mechanism against newbies. This probably does some good, but it also definitely does a lot of bad (keeping out worthy contributors just because they don't feel like learning the markup language).

I'm sure they could achieve their goal some other way without pushing away talented contributors.


> it's not like they don't have the resources to do it.

Have you ever looked at the Mediawiki code base? Last time I skimmed a bit, a few years ago, it was an utterly horrible mess of spaghetti PHP. Just finding developers capable of doing this would be a big challenge. The Wikimedia Foundation has a lot less resources than you might imagine, considering the reach of its projects, and there are an awful lot of other funding priorities (not to mention other code priorities). They’re doing the best they can, but making sweeping changes to the code base is a lot harder than it looks.

> I think there are powerful members of the community who want to keep it somewhat insular and closed off as a defense mechanism.

Do you have any evidence for that? As conspiracy theories go, this one seems pretty weak to me. I can’t think of another organization of comparable size which is as welcoming and open to involvement and contribution from ordinary community members, at all levels of decision making. Almost the entire management and operation of Wikipedia and the Wikimedia Foundation is carried out in public, nearly all of the parties involved are volunteers (with a quite small number of full time staff keeping infrastructure running, handling legal issues, and so on), and many many decisions are made collectively, by consensus.

It’s easy to hate on anything, as a disinterested outsider, without making any real attempt to understand the internal processes involved, but organizing millions of people is a highly non-trivial job, and I think the Wikimedia community has done pretty admirably, all things considered.


> Have you ever looked at the Mediawiki code base? Last time I skimmed a bit, a few years ago, it was an utterly horrible mess of spaghetti PHP.

That's basically the problem. They are devoting considerable resources to a GUI editor, and I believe have several paid staff, both programmers and UI/UX experts, dedicated to the project. But as step #1 it needed a rewrite of the horrible-mess-of-regexes parser into some kind of actual semantic parser, which due to the feature-creep of wikitext syntax (which looks nothing at all like a well-behaved programming language syntax) turned out not to be easy going.

They commissioned a full usability study in 2010: http://usability.wikimedia.org/wiki/Wikipedia_Usability_Init...

Here is some information on the parser-rewrite project: http://www.mediawiki.org/wiki/Parsoid

And on the visual-editor project: http://www.mediawiki.org/wiki/VisualEditor


I just downloaded Mediawiki and took a look at the source code. Some of the later added code doesn't seem as bad, but there's an overwhelmingly extreme amount of mingling between the data logic, including the classes and actions, and the html markup. It makes it utterly hard to read understand for someone wanting to contribute. The codebase isn't light either so (correct me if I'm wrong but) I don't think a complete rewrite of the application ever occurred after the first version.

Now if only some of us stopped creating social networks for kitties, and actually contributed our time and hackery into something that actually benefited the world.


You might be interested in the Parsoid project, which Wikimedia funded precisely to try to build a more maintainable pipeline for parsing syntax into some kind of AST or document model, disentangled from the other cruft: http://www.mediawiki.org/wiki/Parsoid

I believe it's currently being maintained separately, so the logic in MediaWiki itself is still the crufty version.


> Have you ever looked at the Mediawiki code base? Last time I skimmed a bit, a few years ago, it was an utterly horrible mess of spaghetti PHP. Just finding developers capable of doing this would be a big challenge. The Wikimedia Foundation has a lot less resources than you might imagine, considering the reach of its projects, and there are an awful lot of other funding priorities (not to mention other code priorities). They’re doing the best they can, but making sweeping changes to the code base is a lot harder than it looks.

If they lack the resources to do it, they could easily raise the money via a targeted donation drive or a kickstarter project. Telling the web: "Hey, here's what we want to do to make Wikipedia editing better, but we need $X to do it" would probably raise the target many times over. If they can't find good programmers right now, offer triple the salary or whatever. Incentives work.

> It’s easy to hate on anything, as a disinterested outsider, without making any real attempt to understand the internal processes involved, but organizing millions of people is a highly non-trivial job, and I think the Wikimedia community has done pretty admirably, all things considered.

How do you know whether I'm an insider or outsider?

I've contributed thousands of edits and was very active on Wikipedia maybe 6-7 years ago, but mostly stopped because of all the internal politics. Now I just fix typos or formatting errors when I notice them.

The fact is, if creating an easier way for newbies to contribute was a priority for the organization, I'm pretty sure they could have done it over the past many years. In that time whole companies were created from scratch. I can't believe that an organization as powerful (with such support and mindshare worldwide) as Wikipedia couldn't created a GUI editor. I could be wrong, but it just doesn't feel like they want it that much to happen, and the challenges in their way are used as excuses for not going full steam ahead...


"Just finding developers capable of doing this would be a big challenge."

I'm pretty sure one of the very highest profile websites on the entire internet could find developers capable of handling it.

Actually, I'm 100% certain of it.


Send them over. I’m sure the Wikimedia foundation would be glad to have them: http://wikimediafoundation.org/wiki/Job_openings

Edit: From the Wikimedia Foundation’s “2011–2012 Annual Plan”:

> 2011-12 Risks:

> 1) Editor decline is an intractable problem. Declining participation is by far the most serious problem facing the Wikimedia projects: the success of the projects is entirely dependent upon a thriving, healthy editing community. We are responding with a multi-faceted approach that blends big obvious fixes (e.g., Visual Editor) with more experimental approaches (e.g., the -1 to 100 retention projects and editor recruitment initiatives in India and Brazil). The WMF is also putting resources towards expanding community awareness and understanding of the problem, and putting in place mechanisms for decentralized community innovation so that community initiatives can help to solve it. We will be tracking progress throughout the year, and if necessary will sacrifice other activities to increase resources dedicated to this. [...]

> 9) A shortage of Silicon Valley technical talent hurts our ability to recruit and retain technical staff. The Bay Area is currently facing a major shortage of talented engineers, and tech companies are finding it difficult to hire and retain good tech staff. This could impair our ability to grow our tech staff as planned from 28 to 50. Mitigation: Like the rest of the tech sector, there is not much we can do to mitigate this serious problem. However, we will dedicate more resources towards technical recruitment in 2011-12 compared with 2010-11, and we'll be very clear about our value proposition: The Wikimedia Foundation is not about monetizing eyeballs, our direction isn't set by VCs, and we're financially stable. We offer a fair, friendly, fun environment for talented engineers who want their individual contribution to result in making the world a better place for hundreds of millions of people. [...]

> THE 2011-12 PLAN [...]

> Key activities supporting Priority #2, Diversifying the Editor Community are:

> 1) Visual Editor: A new default editing environment for Wikimedia projects which does not require markup. [...]

> 2011-12 Plan Targets [...]

> 5) Develop Visual Editor. First opt-in user-facing production usage by December 2011, and first small wiki default deployment by June 2012. [...]

> The 2011-12 plan reflects our continued desire to grow the organization's programmatic capacity by growing its staff, with an emphasis on thoughtful recruitment and integration of new people. In 2011-12, we plan to grow staff 50% from 78 to 117.


Atlassian recently stripped the wikisyntax editor from their wiki, forcing the editor to be a newbie-friendly WYSIWYG GUI that's not really WYSIWYG (it uses a lot of incorrect size placeholders). Thing is, it is easier for newbie users... but the data structures are insanely more generic and it's much harder to make the data on your page fit in a way that's consumable for the reader. Worst of all, it doesn't handle old wikisyntax pages at all well, so if you need to update an old table, it can cause problems.

A good GUI editor is one that produces good output - and that is really hard to do, particularly in the constrained space of a webpage.


If there's one site that needs a GUI editor, it's Wikipedia. Like MikeCapone said, it's not like they don't have the resources. It's something that should've been started on years ago.

> to take one example I was recently frustrated by, the editor at Adobe’s forums – and Adobe is a gigantic company with thousands of highly paid, experienced developers

I don't personally know any Adobe employees, but my (albeit limited) experience with Adobe software, such as Flash and Acrobat, certainly triggers descriptions such as "horrible buggy piece of crap."


If you think adding a GUI editor to Wikipedia is a top priority, you should try to either do some community organizing to push for it, or try diving into the code to get some idea of how hard it would be to add.

More than most institutions, Wikipedia is quite open and democratic and it’s possible for regular contributors to have a big impact, if they’re willing to put the work in.

Making a change like this will be a huge amount of work for someone, which is why it hasn’t happened yet. But if you can gather a team of like-minded folks, with enough dedicated organizing/coding effort I’m sure you could make some good progress.


The GUI editor is under (active?) development. You can try it here in a sandbox : http://www.mediawiki.org/wiki/Special:VisualEditorSandbox

More info here : http://www.mediawiki.org/wiki/Visual_editor


I agree with your underlying point, but due to a weird quirk of Wikipedia history, Catholicism in particular was for quite some time way over-represented; in the initial land-rush to populate Wikipedia with content, they grabbed a number of out-of-copyright encyclopaedias whose text were already in digital form, and one of these was an old edition of the Catholic Encyclopedia from 1913. With the strange result that the very most obscure topics in Catholicism were actually pretty well-covered, though often with information that was a century out of date! Even now you can find articles that have been barely tweaked from their 1913 form.


And why shouldn't it, people contributing to something for free are likely doing so in some part to shape it to something they themselves would find useful. Seems very similar to open source software.

If there are major parts missing that people deem important maybe they need to step up and provide some funding to get them done.


I think the problem isn't that the "Wikipedians" aren't creating this content but are advocating its deletion when "outsiders" do add it.


World history.

The article for the XBOX 360: http://en.wikipedia.org/wiki/XBOX_360

The article for the Republic of Ragusa: http://en.wikipedia.org/wiki/Republic_of_Ragusa

Wikipedia's editors (or, perhaps more accurately, humanity) have a bias in thinking details in the present are more important than details in the past.


While I'm sure a bias towards the present is part of the problem, there's also the simple fact that it's easy to write about things plenty of other people are writing about, and about which details are plentiful. Many of the things Wikipedia is missing, but really should have if it wants to call itself an encyclopaedia with any pretense at seriousness are obscure, and their details locked away in obscurity. (As a consequence of this, many of those things Wikipedia does have articles about are couched in the language of someone who already understands the topic, mitigating their utility to those who don't.)

All that said, the article on the Republic of Ragusa was far better and longer than I'd have expected, given its use as your counter-point.


That's very amusing, because the exact opposite bias can be found in traditional encyclopedias:

> The other day I read a dozen thousand words about Assyrian archeology in my DVD copy of Encyclopedia Britannica, but when I wanted to read about the Xbox 360, there wasn't even a single entry, so I gave up.

http://www.theverge.com/2012/7/12/3154537/paul-miller-offlin...

> Wikipedia's editors (or, perhaps more accurately, humanity) have a bias in thinking details in the present are more important than details in the past.

I don't know if that's entirely true. I think it significantly depends on what those "details" are: http://en.wikipedia.org/wiki/History_of_video_games


The test of time takes time...


More important? No, but certainly far more accessible, and the low-hanging fruit gets picked the most, so to speak.


You make a great point, and I think that ties into how one defines 'important information'. Specs/prices/etc.etc. about the XBOX 360 are all but ubiquitous on the internet; does this mean that, because they're more common, they're more important and thus should be added to Wikipedia? Or do stricter filters need to be placed on more common knowledge/data?


Information is fractal in nature. The work is continuous.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: