Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'll make a non-mathematical observation about data-mining. Real world data sets are messy. Humans are just not the most reliable data entry machines. Data sets are not always gathered for the purpose you intend to use them. You have to validate/scrub/realign your data set before wasting time on an analysis that could be rendered meaningless by these issues.

It's the old "Garbage In, Garbage Out" principle, but it's easy to forget until the real world hammers it into you.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: