Monday
Apr162007
Ruby regular expression fun
Monday, April 16, 2007 at 5:55PM I found several regular expressions to validate all sorts of things, URLs, names, email addresses, et cetera. Here is an example for an email address validation, I found:
/^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/i
How do you like the following email address, which validates fine with this filter?:
hre32443_d.@ter.com%0A<script>alert('hello')</script>
%0A is a line break.
^$ in Ruby match LINE begin and end, not the overall begin and end, \A and \z does the job! The same JavaScript works in the part before the @. This is a first step to disallow HTML and line breaks:
/\A([^@\s<>'"]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i
A whitelist approach is always better (are there other characters in a name?):
/\A([\w\.\-\+]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i
Edit: This will match most of today's email addresses, without comments. For email addresses compliant to the RFC 822, you can use this regular expression.




Reader Comments (363)
Should definitely allow '_' and '+' in the account name, too.
_ is in \w
I agree on +, thanks, I changed it.
This regex won't validate many forms of email addresses such as those with embedded comments or valid special characters in the local part. I'd prefer to use by Tim Fletcher, which is much closer to what the RFC allows.
The plugin uses this same code internally when checking an email's syntax.
Thanks. Yes, I went for the email address, only, without comments.
Cheap xanax. Cheap xanax online buy cheap xanax buy cheap xanax. Xanax cheap phentermine forum r veries. Buy cheap xanax without prescription....
your regex also allows emails in this form:
.aasdf@domain.tv
is a email address with a dot as a first letter really valid?
I guess the regex is better like this:
Hi Jim. Photos i received. Thanks
hello,
i'm looking for some help at the moment. i implemented a validation, but i don't fully understand the reg ex:
/\A([\w\.\-\+]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i
is my understanding right?:
\A :beginning of string
\z :end of string
[\w\.\-\+] :word characters, dots, minuses and pluses are allowed in the first range?
concatenated with @ then second range
(?:[-a-z0-9]+\.) : within lowercase characters and numbers concatenated with .
+[a-z]{2,}) : concatenated with 2 up to many characters
i'm confused-need help thx
yes, kind of:
\w stands for [A-Za-z0-9_]
ignore ?: here
(?:[-a-z0-9]+\.)+ => several blocks separated by dots. The blocks can be alphanumerical and a -
[a-z]{2,} stands for the TL domain, at least two characters
But I recommend the RFC 822 expression.
I like your website I will share this with friends
Just wanted to say hi, thanks and bye
Boy, this is some high-class site
This is a very beautiful website, I have enjoyed my visit here very much. I'm very honoured to sign in your guestbook. Thanking you for the great work that you are doing here.
Good-looking site. Congratulations.
Hi I thank you for a wonderful site. You have done very good job.
Nice! We truly liked this work .
%-) genuinely interested by this website
quite enjoyed your work .
This is a one super duper site
Howdy! Great site. Great content. Great! I can recommend this site to others!