Ruby on Rails Security Project

Exploring the Security of Rails and friends.

Ruby on Rails Security Project header image 2

Ruby regular expression fun

April 16th, 2007 · 7 Comments

I found several regular expressions to validate all sorts of things, URLs, names, email addresses, et cetera. Here is an example for an email address validation, I found:
 
/^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/i 
 
How do you like the following email address, which validates fine with this filter?:
 
hre32443_d.@ter.com%0A<script>alert('hello')</script>
 
%0A is a line break.
^$ in Ruby match LINE begin and end, not the overall begin and end, \A and \z does the job! The same JavaScript works in the part before the @. This is a first step to disallow HTML and line breaks:
 
/\A([^@\s<>'"]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i 
 
A whitelist approach is always better (are there other characters in a name?):
 
/\A([\w\.\-\+]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i
 
Edit: This will match most of today's email addresses, without comments. For email addresses compliant to the RFC 822, you can use this regular expression.

Tags: Rails

7 responses so far ↓

  • 1 hannibal // Apr 16, 2007 at 8:36

    Should definitely allow ‘_’ and ‘+’ in the account name, too.

  • 2 Heiko // Apr 16, 2007 at 9:09

    _ is in \w
    I agree on +, thanks, I changed it.

  • 3 Dan Kubb // Apr 16, 2007 at 10:37

    This regex won’t validate many forms of email addresses such as those with embedded comments or valid special characters in the local part. I’d prefer to use this one by Tim Fletcher, which is much closer to what the RFC allows.

    The plugin validates_as_email uses this same code internally when checking an email’s syntax.

  • 4 Heiko // Apr 16, 2007 at 11:22

    Thanks. Yes, I went for the email address, only, without comments.

  • 5 dieter // May 23, 2007 at 8:48

    your regex also allows emails in this form:

    .aasdf@domain.tv

    is a email address with a dot as a first letter really valid?

    I guess the regex is better like this:

    /\A([-a-z0-9]+[\w\.\-\+]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i

  • 6 tobias // Jun 25, 2007 at 5:47

    hello,
    i’m looking for some help at the moment. i implemented a validation, but i don’t fully understand the reg ex:
    /\A([\w\.\-\+]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i
    is my understanding right?:
    \A :beginning of string
    \z :end of string
    [\w\.\-\+] :word characters, dots, minuses and pluses are allowed in the first range?
    concatenated with @ then second range
    (?:[-a-z0-9]+\.) : within lowercase characters and numbers concatenated with .
    +[a-z]{2,}) : concatenated with 2 up to many characters

    i’m confused-need help thx

  • 7 Heiko // Jun 25, 2007 at 6:32

    yes, kind of:
    \w stands for [A-Za-z0-9_]

    ignore ?: here
    (?:[-a-z0-9]+\.)+ => several blocks separated by dots. The blocks can be alphanumerical and a -

    [a-z]{2,} stands for the TL domain, at least two characters

    But I recommend the RFC 822 expression.

Leave a Comment