Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Dude, it's the second sentence of the first paragraph:

> For my purposes, the Majestic Million dataset felt like the perfect fit as it is ranked by the number of links that point to that domain (as well as taking into account diversity of the origin domains as well).



And moreover, the author’s conclusion is that the dataset is bad.

> While I had expected some cleanliness issues, I wasn’t expecting to see this level of quality problems from a dataset that I’ve seen referenced pretty extensively across the web


Yeah, but they're still providing a dataset that's just plain bad. It's hardly relevant how many sites link to some other site, if it's dead.


It's only bad data if it does not include what it claims to include.

If the dataset is defined as inlinks, and it is inlinks, then the data is good.


part of the problem is it's not the number of links, it's referring subnets. Fairly certain this includes, script tags.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: