Laserlike

The wisdom of crowds and the importance of diversity.

March 13, 2009 · 8 Comments

I have met with several startups over the past few months that have developed “wisdom of crowds” products.  After explaining how the masses will come together to bring clarity to something (the future price of a stock, the likely winner of a college hoops game, the quality of a product or service), they move on to their “viral distribution” model — that is, they won’t need to do any marketing because the same people providing the wisdom will invite all of their friends to the site.

Oh, Wise is Wise, and Viral is Viral, and never the twain shall meet.

In order to understand why mixing knowledge aggregation products with viral distribution can be a counterproductive combination, let’s review the mechanics of a wisdom of crowds system.  In order to do so, let us develop a hypothetical product for illustrative purposes.

Where In The World is Oussama Bin Laden.com (aka UBL.com).

The purpose of UBL.com is to leverage the wisdom of the masses to help the good guys find public enemy #1.  A user will get a random section of a global map and will then be asked to put a pin in the spot where she believes UBL is hiding out.  That’s it.  That’s the product.

If the starting point is truly random, then people who have no idea about the location of UBL will produce random guesses which will have minimal overlap and cancel out.  But those individuals with bits and pieces of knowledge will together paint a picture of his location, even if none of them individually actually knows with any level of certainty.  For example, someone who grew up in Afghanistan may know that a particular area of the Afghanistan-Pakistan border has an extensive cave system that is hard to see from the sky and hard to reach by foot.  Special Forces commandos who have explored a number of areas over the past few years may know where UBL is *not* located, increasing the signal of where he might be hiding.  You get the idea.

Now let’s say that Conan O’Brien does a review of UBL.com just after launch which accounts for 90% of UBL.com’s traffic.  Conan tells everyone that he’s pretty sure UBL is hiding out at the U.S. Embassy in Stockholm.  There is a good chance that UBL.com will experience a non-random number of pins on the U.S. Embassy distorting the accuracy of our guess about the whereabouts of UBL.

Similarly, viral marketing schemes risk launching information cascades — if a large chunk of the user base was introduced to UBL.com by friends, the diversity of input is almost certainly diminished.  Whether this is explicit (user invites a friend with the note “I think UBL is at the U.S. Embassy in Stockholm”) or implicit (offline discussion with respect to UBL’s whereabouts or even biases within certain social groups) it is clearly counterproductive to the goal of accuracy.

Knowledge aggregation systems benefit greatly from diverse inputs.  If accuracy is your goal, you are better off developing antiviral features than viral ones.

The importance of diversity.

Diversity increases the fidelity of inputs, at least with respect to knowledge aggregation.  One of the key strengths of the American system is that we have both distributed governance systems (capitalism, democracy) AND a diverse population.  In addition to the many benefits of importing hungry, bright talent, immigration offers the continuous injection of diverse points of view.   Diversity is just as critical to the efficacy of wisdom of crowd businesses online.  So how can we get diverse inputs?

Enter Google.

You have heard the stories about how a few thousand contributors account for the large percentage of edits on Wikipedia.  But the diversity of the inputs needn’t be measured on a byte-weighted basis.  The vast majority of Wikipedia’s 78.2MM U.S. monthly unique visitors arrive via search (Google in particular, I would guess).  The semi-random distribution of users [with respect to the user] to a particular Wikipedia page would tend to limit the risk of a particular bias [if the consumer disagrees with the content, he can simply make the change].  In fact, many of the best wisdom of crowd business are continuously reviewed by a stream of semi-randomly distributed “reviewers” thanks to Google.

Even if you have a small community of creators, if the vast majority of users find your site through third party search [and there are mechanisms for them to improve or challenge the content], you will go a long way to increasing the diversity of your inputs.

You may also want to track the inputs for a particular product by type to see how [and potentially remove] certain biases may impact the system.  In our UBL example, you may review the data from the perspective of all inputs, the best guess of military personell, by country based on IP address, etc.

There is a fundamental tension between building great community products where people invite friends and building a product that aims to assemble diverse inputs to paint a collective picture.  Understanding that tension will go a long way to helping you build a better product.

Categories: ideas
Tagged: , , , , , , ,

8 responses so far ↓

  • Bindu Reddy // March 13, 2009 at 11:57 pm | Reply

    Thanks to Facebook, I found this blog entry :) IMO start-up growth will come from different sources based on which phase the start-up is in.

    For the early phases, start-ups tend to grow via word of mouth and maybe some coverage in a few blogs. To your point, I would also recommend that start-ups approach non-tech blogs and communities to get a good/diverse set of initial beta users. Early adopters tend to know other early adopters and so invites is a good way to get started.

    If the start-up succeeds in getting good engagement, later stage growth comes from Google/You Tube and Facebook. Of course, word of mouth will continue to be a strong driver of growth esp. if the core product is good and engaging.

  • Yumio // March 14, 2009 at 12:56 pm | Reply

    One thing about Wikipedia – the oft repeated stat about a few thousand people accounting for 80% of their “edits” is misleading. Its true that there are super editors who do majority of the work, but that’s EDITING. They do the categorizing, the reviewing, etc. The majority of articles are actually WRITTEN by 1-timers, or singletons – which only proves your point that Wikipedia provides the diversity of opinions that makes Crowd-sourcing work.

  • Robert // March 14, 2009 at 3:18 pm | Reply

    I came to your blog following a link to your “scientific product development” post (from startuplessonslearned), and it was a brilliant post. I thought you had taken a long holiday as your last post was from 2008, so thanks for coming back!

    Is there a chance Sean Macnew will post the 2nd part of the revenue formula as he said?

  • Sydney Don // March 18, 2009 at 5:29 pm | Reply

    Mate I love your stuff.

    If I could be so bold as to ask a favour. Is there a chance you could enable email distribution in your feedburner account.

    I still read my subs the old school way via email…lol

  • Jonathan WOlf // March 18, 2009 at 8:37 pm | Reply

    Stimulating as always Mike. Does your example really hold though – although the starting point may be skewed, as long as new estimates keep coming in, and there is not a permanent reason why these new estimates only come from a particular group then I think the answer ought to slowly move away from the starting point. This reminds me of the chaos theory (the “butterfly effect”) I did many years ago at university but with a twist – there, to start with the path is tight and predictable but after sufficient time you cannot estimate where it will be, other than within the constraints of an overall envelope. Here, you know the starting point is skewed, it will slowly diverge from that point in some sort of random walk, but you are hoping that it will eventually converge on the correct actual answer.
    I guess this is a long way of saying I don’t think virality is a problem, as long as over time the group that is assembled is diverse/knowledgeable enough to give an answer. So a community that will only ever have male geeks may not come to a good answer about the best depilatory cream, but should be fine for estimating the likely speed improvement of Google’s next release of Chrome? And a community may start being only Conan O’Brien fans (this is a US reference that us foreigners don’t understand by the way) – but I don’t see why it wouldn’t evolve via friends of friends to a group that is entirely different in make-up to those fans.

  • Michael F. Martin // March 19, 2009 at 1:39 am | Reply

    Nice post. One small comment:

    “Diversity increases the fidelity of inputs, at least with respect to knowledge aggregation.”

    I don’t know if it increases the fidelity of inputs — it probably increases both signal and noise, no? — but if you have a good way to pick out the signal, it certainly improves the information content in it.

    http://en.wikipedia.org/wiki/Central_limit_theorem

  • Mike Speiser // March 19, 2009 at 4:04 am | Reply

    Excellent clarification Michael. Technically, the inputs are not better. In fact, on average they may be worse. But the aggregate output is better because the crap cancels itself out. Thanks for the correction.

  • Mike Speiser // March 19, 2009 at 4:07 am | Reply

    Email subscription through FeedBurner is now enabled.

    Subscribe to Laserlike by Email

Leave a Comment