Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Okay, what if we flip the problem on its head? Try to make the chatbot seem rude and unhelpful but then it turns out it has a heart of gold?


The article discusses this. The problem is that it's a lot less likely for the chatbot to veer in that direction (seems initially hostile, but is secretly good) than the opposite (seems initially good, but is secretly hostile):

> I claim that this explains the asymmetry — if the chatbot responds rudely, then that permanently vanishes the polite luigi simulacrum from the superposition; but if the chatbot responds politely, then that doesn't permanently vanish the rude waluigi simulacrum. Polite people are always polite; rude people are sometimes rude and sometimes polite.


Yeah, let's create Wednesday chatbot from the Addams family.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: