I've decided that the best way to validate email address is to not validate them, but require that any signup be finalized by the individual following a link emailed to them.
This allows a person to use any damn thing they want as their email address, provided it works and they can get the email.
If sending emails is 100% free, but you still have to worry about your sender reputation. [1] Sending a large amount of mail to invalid emails will start getting your emails put in people's spam folders. That's the reason email validation services exist, to prevent sending to invalid emails. [2]
Also, humans make mistakes. You should detect spelling errors and typos then suggest corrections. [3]
Mickey@mouse.com is a perfectly valid address but it isn’t my address. If that matters for your application you need to spend the capital to send an email. No way around it.
Even worse, I have commonfirstnamecommonlastname@gmail.com and get several emails a day that I didn't sign up for. Now the person who did sign up isn't getting them and I have to figure out how to opt out of them. Sometimes these website accounts already have payment/personal details associated with them, which I now have access to (and indeed, sometimes have to view) in order to find the "stop sending me email" button.
Always send the confirmation "did you sign up?" email. Always.
Oh don't worry about this at all because spammers are going to sign up with legitimate e-mail addresses that are going to get your reputation lowered. Very common tactic and you won't be saved by some dumb regex that would just probably hurt a few real users.
This is really just a problem for spammers going out and either buying mailing lists that haven't been validated or scraping the web for email addresses. In the case of the spammer, they would probably care a lot more about their bounce rate than their false negative rate (i.e. valid addresses that fail some sort of validation regex). In fact, they would probably tune their validation to actually throw away addresses that didn't look correct just to be safe.
Obviously, this is a different scenario than your bank not accepting your valid (per RFC) email address. Which is why any sort of blanket advice is pretty dumb. Not that I care to aid spammers...
The other scenario might be a site that puts up a "paywall" type thing, where you are forced to enter an email address to gain quick access to something, but doesn't want to bother you with going and verifying an email (e.g. instant discounts, downloading a PDF, etc.). Or in-person email address collection when you buy something in a store. It's never a good idea to collect email addresses of people that have no desire to subscribe to your marketing.
Yeah, buying a dataset of a couple of million email addresses and then using them to email people who didn't sign up or request your email isn't really something I care to optimize. It wouldn't shock me at all if the services that charge to validate emails are just doing a half-ass regex and leeching off spammers anyway.
100% agreed here. Accept a text field; maybe validate that it has an @ in it and a . after the @.
Send that address a confirmation email. Now you've got consensual opt-in and you've somewhat protected yourself from adding a wrong address to your recurring mailing list.
Prevent abuse with long (seconds) delays between submissions from the client. If the user thinks they did it right, they're waiting on their email inbox anyway; if they immediately realize they made a typo, it'll take 2-3s to fix.
The RFCs were written when manually (not from cron) sending email to another user on your local system as a thing that actually happened. I'm certain you actively want to avoid that now.
Yup I’ve been working in email marketing for a long time and this is what I do if I need a regex. I remember when .mobi TLD came out and people with those address had a terrible time signing up for things because a bunch of developers got too cute and assumed a TLD could only be 2 or 3 characters. You want to be really lax in what you validate.
If I can send you an email and you can verify that you have access to that email, your email is "valid enough" for me.
Then, the validation is basically "is there an @ and after a dot in there?". I find that after that, every hour spent on improving the validation will just cause more emails falsely flagged as invalid, more support requests from the people who couldn't sign up with valid emails, it's code we need to maintain, anytime edits the validation logic risks breaking sign ups completely.
So with more "improvements" to the validation, you just cause more problems. Then why do it?
I hear the reputation arguments, but in practice, it never happened to any of the organizations I worked for.
What happens though very often is naive engineers trying to solve problems the business doesn't have with knowledge they lack...
My cheap-o approach to this is: Check there’s an @, and that there is a dot afterwards. This excludes local domains obviously, but I don’t want those anyway.
This allows a person to use any damn thing they want as their email address, provided it works and they can get the email.