Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I never saw a web directory that took a serious approach to using automation to curate the directory. It's probably more feasible than people think because it's a matter of classifying links up or down (relevant or not) which is a much more ontologically and mathematically tractable problem than learning a ranking function or trying to classify things into one of N>2 categories.

I think the problem may more have been the lack of a sustainable revenue model. Getting volunteers to curate the directory is particularly destructive because the people who most want to volunteer either (1) want to promote something or (2) want to get paid so somebody else can promote something.



> I never saw a web directory that took a serious approach to using automation to curate the directory.

That's what search indexes are. Automatically curated directories.

It's just that the UI to the index is natural language queries instead of clicks through a graph, because the latter isn't scalable to large corpuses.


No, I think a modern directory would be a set of topics in an ontology with links. If I had to seed one I would suck all the external links out of Wikipedia, impose some organizing structures (probably overlapping trees or dags) and then build a set of classifiers for nested topic relevance, spaminess, etc.

There are certain sites which have a landing page for every topic in some set of topics, you could add a lot of links quickly if you built rules for importing links from particular sites. Adding 10 sites a day with 1000 links each would be very possible, in 100 days you could build out a million links.


Exactly how do you think Google (and Bing, and ...) work? They do start from known indexes, in particular wikipedia. Hell, Google even has internal papers where they claim they improved search quality by "wikipedia's" as a unit.


Actually Microsoft bought a company called Powerset which did information from Wikipedia to build a "semantic as in semantic web" index, that technology became the heart of Bing.

Google was caught flat footed and wound up buying and killing Freebase in order to catch up. They lied and said they rejected "semantic web" approaches despite hiring one of the leaders of the Cyc project as their head of research.

Still there is a difference between exposing that kind of database through a full text index vs exposing it through a browsing interface.


I consider Reddit to be a form of web directory in a way.


I can't use it though because I've got this disability that memes cause me extreme distress.


That sounds like a meme.


Every idea is a meme in the classical sense.


Panik!! Kalm. Panik!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: