Laserlike

Ontologies for everything else.

August 22, 2008 · 7 Comments

Ontology, taxonomyfolksonomy.

An ontology is concerned with the categorization of things.  A “thing” can be just about anything — a person, a company, a particular news story, a video, an application, or a web page.  Ontology is a broad concept and subsumes taxonomy and folksonomy. Google Search delivers relevant results because it leverages a massive corpus ontological information about web sites and entities.  PageRank is but one device in the search arsenal that infers ontological information about both the classification and quality of a page.

A taxonomy is a more formal construct concerned with the ordering of things, usually in a hierarchical structure.  Yahoo’s Directory is a taxonomic organization of the web, for example.  The Open Directory Project (DMOZ) is also a taxonomic organization of the web, but rather than leveraging a group of internal “surfers” as Yahoo! did, DMOZ is an open source directory.  

In the early days of tagging, the term folksonomy emerged to describe a non-hierarchical ordering of things through collaborative input.  For example, when a user tags a photo in Flickr, she is creating an informal taxonomy.  These tags are aggregated to help consumers find photos — either by “tag surfing” or by making search more precise as a result of better meta-data.

Who needs philosophy?

Let’s review some of the big wins over the past decade on the consumer internet.

Yahoo!  

Started with a directory built by web surfers.  Employed a chief taxonomist to overlook the organization of the directory.

Google 

As the amount of data on the web grew exponentially, taxonomies will too structured to keep up — a more flexible ontology was required.  Google replaced the directory (a taxonomy) with search (a less formal ontology).

Facebook

Social networks are ontologies.  When you search for a name (e.g., Joe Smith) on Google, you get the NBA player.  When you search for Joe Smith on Facebook, you are far more likely to find the Joe you are looking for.  When search for Joe Smith on Flickr, you get the photo of this man.  The Joe Smith tag on Flickr is text.  When you tag a photo on Facebook, you are encouraged to highlight objects (usually a face) and then select a name from your friends list.  The value of the tag “Joe Smith” tag on Facebook is far more valuable than the “Joe Smith” tag on Flickr because it has more information embedded in the tag.

And don’t forget that the Internet is the ultimate taxonomy — DNS is a system that maps human-readable names (e.g., http://laserlike.com/) to an IP address (http://76.74.254.123/, which is the IP address for WordPress).

 Opportunities.

If you can invent something that improves the information about a big class of things on the web, you can have a massive impact.  

For example, there really isn’t a dominant ontology for organizations.  We have tickers for the ~10,000 public companies in the United States.  What about the other 8MM businesses in the US?  Nonprofits and clubs?  And the rest of the world?  There are firms who have tried to create some of this information (e.g., Hoovers, Crunchbase, VentureWire).   Once you have a solid ontology for this type of thing (ideally with some sort of namespace), you can pivot around it to create massive value.  You could munge together the social graph with the organizational namespace for a definitive guide of who belongs to which organizations.  How about the automated inclusion of hypertext links (ala Yahoo! Shortcuts) in every bit of text on the web linking to that organizational namespace?  And clustering groups of organizations for the purposes of discovery and analysis?

How about a location-based namespace?  You have some of this today with addresses and zip codes.  But what if you could take the GPS coordinates of every fixed object on the planet and append a human-readable name (like DNS does with IP addresses)?  And not only based on horizontal location, but also including vertical location (e.g., the 30th floor of the Bank of America building in SF)?  Perhaps you append a location code to an URL (e.g., bankofamerica.com/sf/555-30)?

In shopping, certain categories like music and books have formal ontologies (e.g., ISBN codes).  But most categories lack universal codes (or SKUs), requiring every retail and online site to create their own ontology — a pair of Levis 501 Jeans may have the code Levi501_2008 at one retailer and LS12345 at another.  Ebay has employed a very loose ontology due to the fact that they continuously have sellers adding and subtracting products from inventory.  This makes finding a product and comparing prices far more laborious than is ideal.

These are simply examples that come to mind as I type.  Any other ideas about categories that would benefit from improved classification?

Categories: ideas
Tagged: , , , , , , , , , ,

7 responses so far ↓

  • Bob Ngu // August 22, 2008 at 9:49 pm | Reply

    Mike, often times when I tried to remember an article I read on say Techcrunch, I wish there is an ontology of sorts, likely using tags or keywords, that allows me to narrow down the list of articles to make my search easier.

  • Ben Orenstein // August 23, 2008 at 3:15 am | Reply

    I think the ‘awesome bar’ in Firefox 3 is a nice example of a taxonomy that was begging to be created. I’ve been using it for only a month and it already feels like I couldn’t live without it.

    -Ben

  • Ian Kennedy // August 23, 2008 at 5:12 pm | Reply

    D & B has been selling it’s DUNS number listing for all businesses and non-profits for years.

    Companies like Plazes (now part of Nokia) have been building up a database of wifi access points that could make up your geo-graphic database.

    Amazon is best positioned to index physical products and with the recent feature that allows you to add “anything” to your Amazon Wishlist – they’ve positioned their users to help them index the world of physical things.

    And don’t forget MyBlogLog which I have always positioned as a DNS for People ;-)

    http://everwas.com/2007/07/mybloglog_dns_for_people.html

  • Mike Speiser // August 23, 2008 at 8:30 pm | Reply

    Excellent points Ian. A few brief thoughts:

    Designing a namespace is trivial (e.g., all entities will be identified by 3 sets of 3 digits and each entity will have a unique identifier that is human readable).

    The challenges are:

    1. Mapping entities to your namespace. With companies, I would imagine this to be somewhat easy as every company files documents with the government (taxes, incorporation certificates, etc.). So it’s possible that munging that data together with your namespace would be simple [although I would really like to see it get more and more atomic -- like Yahoo = 123.456.789, but Yahoo! MyBlogLog = 123.456.789/001 or the Math Club at Stanford = 987.654.321/001]. In other cases (the names of people), key data like social security numbers are not in the public domain. So centralized mapping is very difficult. In those cases, decentralizing the mapping exercise allows the system to benefit from the wisdom of the masses (e.g., Facebook with people, DMOZ with URLs).

    2. The biggest challenge, of course, is becoming the de facto standard. There is only one DNS, but there are numerous social graphs [although the market is consolidating, which will lead to de facto standards]. And while D&B and others may have their take at an ontology for businesses, if it doesn’t enjoy a government or de facto monopoly (or oligopoly) it’s probably not that interesting. The number one feature of an ontology is how many people are using it rather than its technical merits…

  • Roger // August 24, 2008 at 1:13 am | Reply

    Is it not true that D&B is the agreed upon standard? I always thought that it was, but I am just asking here.

  • Scott Fitchet // August 24, 2008 at 3:00 pm | Reply

    I’d like to have a delicious-like tagging mechanism available for my bank transactions. Even Mint still makes you stick to a statistic one-to-one list of traditional personal expense categories.

  • Par B // September 28, 2008 at 3:36 am | Reply

    hmmm…

    isn’t there an inherent tradeoff between centralization/decentralization and accuracy?

    i’ve been observing the challenge in that accuracy is hard gauge in the de-centralized systems. for self-organizing ontologies like facebook it seems that this is less of an issue but i’m struck by the inherent tradeoff we are making in accuracy in these newer systems.

    while for some applications this doesn’t make a whole of difference (i.e. social networks) the relative low uptake of these types of technologies in corporations (where accuracy is paramount) seems to be an indication of that these types of technologies aren’t for every type of problem (or at least not yet). enterprise search is largely a problem that hasn’t been solved yet (yeah, i know what the marketeers say in the search companies, but i don’t think anyone would argue with me when i suggest that their success in that market is limited at best).

    so the point i’m trying to get to is that for the corporate side there seems to be a case for some form of hybrid of a joint centralized/decentralized approach. something where you can tap into the breadth of an organization for scale, but also keep some amount of structure (dare a say schema?) using the more traditional centralized approaches (which also would help transcend the intra-corporate departmental boundaries that tend to prevent these types of ontologies to work properly).

    seems like there is opportunity for innovation right in that intersection of technology. Or perhaps i’m just full of it? ;)

Leave a Comment