Validate Phone Numbers: A Detailed Guide

Following are a couple recipes I wrote for Regular Expressions Cookbook, composing a fairly comprehensive guide to validating and formatting North American and international phone numbers using regular expressions. The regexes in these recipes are all pretty straightforward, but hopefully this gives an example of the depth you can expect from the book.

For more than 100 detailed regular expression recipes that include equal coverage for eight programming languages (C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET), get your very own copy of Regular Expressions Cookbook. Also available in Russian, German, Japanese, Czech, Chinese, Korean, and Brazilian Portuguese.

Following is an excerpt from Regular Expressions Cookbook (O'Reilly, 2009) by Jan Goyvaerts and Steven Levithan. Reprinted with permission.

Validate and Format North American Phone Numbers

Problem

You want to determine whether a user entered a North American phone number in a common format, including the local area code. These formats include 1234567890, 123-456-7890, 123.456.7890, 123 456 7890, (123) 456 7890, and all related combinations. If the phone number is valid, you want to convert it to your standard format, (123) 456-7890, so that your phone number records are consistent.

Solution

A regular expression can easily check whether a user entered something that looks like a valid phone number. By using capturing groups to remember each set of digits, the same regular expression can be used to replace the subject text with precisely the format you want.

Regular expression

^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement

($1) $2-$3
Replacement text flavors: .NET, Java, JavaScript, Perl, PHP

(\1) \2-\3
Replacement text flavors: Python, Ruby

C#
Regex regexObj =
    new Regex(@"^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$");

if (regexObj.IsMatch(subjectString)) {
    string formattedPhoneNumber =
        regexObj.Replace(subjectString, "($1) $2-$3");
} else {
    // Invalid phone number
}
JavaScript
var regexObj = /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/;

if (regexObj.test(subjectString)) {
    var formattedPhoneNumber =
        subjectString.replace(regexObj, "($1) $2-$3");
} else {
    // Invalid phone number
}
Other programming languages

See Recipes 3.5 and 3.15 for help implementing this regular expression with other programming languages.

Discussion

This regular expression matches three groups of digits. The first group can optionally be enclosed with parentheses, and the first two groups can optionally be followed with a choice of three separators (a hyphen, dot, or space). The following layout breaks the regular expression into its individual parts, omitting the redundant groups of digits:

^        # Assert position at the beginning of the string.
\(       # Match a literal "("...
  ?      #   between zero and one time.
(        # Capture the enclosed match to backreference 1...
  [0-9]  #   Match a digit...
    {3}  #     exactly three times.
)        # End capturing group 1.
\)       # Match a literal ")"...
  ?      #   between zero and one time.
[-. ]    # Match one character from the set "-. "...
  ?      #   between zero and one time.
⋯        # [Match the remaining digits and separator.]
$        # Assert position at the end of the string.

Let’s look at each of these parts more closely. The ^ and $ at the beginning and end of the regular expression are a special kind of metacharacter called an anchor or assertion. Instead of matching text, assertions match a position within the text. Specifically, ^ matches at the beginning of the text, and $ at the end. This ensures that the phone number regex does not match within longer text, such as 123-456-78901.

As we’ve repeatedly seen, parentheses are special characters in regular expressions, but in this case we want to allow a user to enter parentheses and have our regex recognize them. This is a textbook example of where we need a backslash to escape a special character so the regular expression treats it as literal input. Thus, the \( and \) sequences that enclose the first group of digits match literal parenthesis characters. Both are followed by a question mark, which makes them optional. We’ll explain more about the question mark after discussing the other types of tokens in this regular expression.

The parentheses that appear without backslashes are capturing groups and are used to remember the values matched within them so that the matched text can be recalled later. In this case, backreferences to the captured values are used in the replacement text so we can easily reformat the phone number as needed.

Two other types of tokens used in this regular expression are character classes and quantifiers. Character classes allow you to match any one out of a set of characters. [0-9] is a character class that matches any digit. The regular expression flavors covered by this book all include the shorthand character class \d that also matches a digit, but in some flavors \d matches a digit from any language’s character set or script, which is not what we want here. See Recipe 2.3 for more information about \d.

[-. ] is another character class, one that allows any one of three separators. It’s important that the hyphen appears first in this character class, because if it appeared between other characters, it would create a range, as with [0-9]. Another way to ensure that a hyphen inside a character class matches a literal version of itself is to escape it with a backslash. [.\- ] is therefore equivalent.

Finally, quantifiers allow you to repeat a token or group. {3} is a quantifier that causes its preceding element to be repeated exactly three times. The regular expression [0-9]{3} is therefore equivalent to [0-9][0-9][0-9], but is shorter and hopefully easier to read. A question mark (mentioned earlier) is a special quantifier that causes its preceding element to repeat zero or one time. It could also be written as {0,1}. Any quantifier that allows something to be repeated zero times effectively makes that element optional. Since a question mark is used after each separator, the phone number digits are allowed to run together.

Note that although this recipe claims to handle North American phone numbers, it’s actually designed to work with North American Numbering Plan (NANP) numbers. The NANP is the telephone numbering plan for the countries that share the country code “1”. This includes the United States and its territories, Canada, Bermuda, and 16 Caribbean nations. It excludes Mexico and the Central American nations.

Variations

Eliminate invalid phone numbers

So far, the regular expression matches any 10-digit number. If you want to limit matches to valid phone numbers according to the North American Numbering Plan, here are the basic rules:

  • Area codes start with a number from 2–9, followed by 0–8, and then any third digit.
  • The second group of three digits, known as the central office or exchange code, starts with a number from 2–9, followed by any two digits.
  • The final four digits, known as the station code, have no restrictions.

These rules can easily be implemented with a few character classes:

^\(?([2-9][0-8][0-9])\)?[-. ]?([2-9][0-9]{2})[-. ]?([0-9]{4})$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Beyond the basic rules just listed, there are a variety of reserved, unassigned, and restricted phone numbers. Unless you have very specific needs that require you to filter out as many phone numbers as possible, don’t go overboard trying to eliminate unused numbers. New area codes that fit the rules listed earlier are made available regularly, and even if a phone number is valid, that doesn’t necessarily mean it was issued or is in active use.

Find phone numbers in documents

Two simple changes allow the previous regular expression to match phone numbers within longer text:

\(?\b([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Here, the ^ and $ assertions that bound the regular expression to the beginning and end of the text have been removed. In their place, word boundary tokens (\b) have been added to ensure that the matched text stands on its own and is not part of a longer number or word.

Similar to ^ and $, \b is an assertion that matches a position rather than any actual text. Specifically, \b matches the position between a word character and either a nonword character or the beginning or end of the text. Letters, numbers, and underscore are all considered word characters (see Recipe 2.6).

Note that the first word boundary token appears after the optional, opening parenthesis. This is important because there is no word boundary to be matched between two nonword characters, such as the opening parenthesis and a preceding space character. The first word boundary is relevant only when matching a number without parentheses, since the word boundary always matches between the opening parenthesis and the first digit of a phone number.

Allow a leading “1”

You can allow an optional, leading “1” for the country code (which covers the North American Numbering Plan region) via the addition shown in the following regex:

^(?:\+?1[-. ]?)?\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

In addition to the phone number formats shown previously, this regular expression will also match strings such as +1 (123) 456-7890 and 1-123-456-7890. It uses a noncapturing group, written as (?:⋯). When a question mark follows an unescaped left parenthesis like this, it’s not a quantifier, but instead helps to identify the type of grouping. Standard capturing groups require the regular expression engine to keep track of backreferences, so it’s more efficient to use noncapturing groups whenever the text matched by a group does not need to be referenced later. Another reason to use a noncapturing group here is to allow you to keep using the same replacement string as in the previous examples. If we added a capturing group, we’d have to change $1 to $2 (and so on) in the replacement text shown earlier in this recipe.

The full addition to this version of the regex is (?:\+?1[-. ]?)?. The “1” in this pattern is preceded by an optional plus sign, and optionally followed by one of three separators (hyphen, dot, or space). The entire, added noncapturing group is also optional, but since the “1” is required within the group, the preceding plus sign and separator are not allowed if there is no leading “1”.

Allow seven-digit phone numbers

To allow matching phone numbers that omit the local area code, enclose the first group of digits together with its surrounding parentheses and following separator in an optional, noncapturing group:

^(?:\(?([0-9]{3})\)?[-. ]?)?([0-9]{3})[-. ]?([0-9]{4})$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Since the area code is no longer required as part of the match, simply replacing any match with ($1) $2-$3 might now result in something like () 123-4567, with an empty set of parentheses. To work around this, add code outside the regex that checks whether group 1 matched any text, and adjust the replacement text accordingly.

See Also

Recipe 4.3 shows how to validate international phone numbers.

The North American Numbering Plan (NANP) is the telephone numbering plan for the United States and its territories, Canada, Bermuda, and 16 Caribbean nations. More information is available at http://www.nanpa.com.


Validate International Phone Numbers

Problem

You want to validate international phone numbers. The numbers should start with a plus sign, followed by the country code and national number.

Solution

Regular expression

^\+(?:[0-9] ?){6,14}[0-9]$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

JavaScript
function validate (phone) {
    var regex = /^\+(?:[0-9] ?){6,14}[0-9]$/;

    if (regex.test(phone)) {
        // Valid international phone number
    } else {
        // Invalid international phone number
    }
}
Other programming languages

See Recipe 3.5 for help implementing this regular expression with other programming languages.

Discussion

The rules and conventions used to print international phone numbers vary significantly around the world, so it’s hard to provide meaningful validation for an international phone number unless you adopt a strict format. Fortunately, there is a simple, industry-standard notation specified by ITU-T E.123. This notation requires that international phone numbers include a leading plus sign (known as the international prefix symbol), and allows only spaces to separate groups of digits. Although the tilde character (~) can appear within a phone number to indicate the existence of an additional dial tone, it has been excluded from this regular expression since it is merely a procedural element (in other words, it is not actually dialed) and is infrequently used. Thanks to the international phone numbering plan (ITU-T E.164), phone numbers cannot contain more than 15 digits. The shortest international phone numbers in use contain seven digits.

With all of this in mind, let’s look at the regular expression again after breaking it into its pieces. Because this version is written using free-spacing style, the literal space character has been replaced with \x20:

^         # Assert position at the beginning of the string.
\+        # Match a literal "+" character.
(?:       # Group but don't capture...
  [0-9]   #   Match a digit.
  \x20    #   Match a space character...
    ?     #     Between zero and one time.
)         # End the noncapturing group.
  {6,14}  #   Repeat the preceding group between 6 and 14 times.
[0-9]     # Match a digit.
$         # Assert position at the end of the string.

Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

The ^ and $ anchors at the edges of the regular expression ensure that it matches the whole subject text. The noncapturing group—enclosed with (?:⋯)—matches a single digit, followed by an optional space character. Repeating this grouping with the interval quantifier {6,14} enforces the rules for the minimum and maximum number of digits, while allowing space separators to appear anywhere within the number. The second instance of the character class [0-9] completes the rule for the number of digits (bumping it up from between 6 and 14 digits to between 7 and 15), and ensures that the phone number does not end with a space.

Variations

Validate international phone numbers in EPP format

^\+[0-9]{1,3}\.[0-9]{4,14}(?:x.+)?$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

This regular expression follows the international phone number notation specified by the Extensible Provisioning Protocol (EPP). EPP is a relatively recent protocol (finalized in 2004), designed for communication between domain name registries and registrars. It is used by a growing number of domain name registries, including .com, .info, .net, .org, and .us. The significance of this is that EPP-style international phone numbers are increasingly used and recognized, and therefore provide a good alternative format for storing (and validating) international phone numbers.

EPP-style phone numbers use the format +CCC.NNNNNNNNNNxEEEE, where C is the 1–3 digit country code, N is up to 14 digits, and E is the (optional) extension. The leading plus sign and the dot following the country code are required. The literal “x” character is required only if an extension is provided.

See Also

Recipe 4.2 provides more options for validating North American phone numbers.

ITU-T Recommendation E.123 (“Notation for national and international telephone numbers, e-mail addresses and Web addresses”) can be downloaded here: http://www.itu.int/rec/T-REC-E.123.

ITU-T Recommendation E.164 (“The international public telecommunication numbering plan”) can be downloaded at http://www.itu.int/rec/T-REC-E.164.

National numbering plans can be downloaded at http://www.itu.int/ITU-T/inr/nnp.

RFC 4933 defines the syntax and semantics of EPP contact identifiers, including international phone numbers. You can download RFC 4933 at http://tools.ietf.org/html/rfc4933.


New library: Are you a JavaScript regex master, or want to be? Then you need my fancy XRegExp library. It adds new regex syntax (including named capture and Unicode properties); s, x, and n flags; powerful regex utils; and it fixes pesky browser inconsistencies. Check it out!

51 thoughts on “Validate Phone Numbers: A Detailed Guide”

  1. It would be better served (IMHO) to use the static method in .Net such as:

    if (Regex.IsMatch( ) ) { }

    Use of static method differs from the instantiated one in the pattern is placed in the regex pattern cache for future use without compiling it. Compilation and Reuse.

    HTH

  2. Here is a c#4.0 extension method I’m currently using to replace phone numbers from some html text. This will remove any pattern that starts with a number and ends with a number and has 8 or more characters inbetween that can consist of either a number, a space or an opening or closing parenthesis. It’s not perfect but it will catch most variations.

    public static string RemoveAllNumbers(this string input)
    {
    string pattern = @”[\d][0-9\s\(\)]{8,}[\d]”;
    return Regex.Replace(input, pattern, ” &lt;<i>phone number removed</i>&gt; “, RegexOptions.IgnorePatternWhitespace);
    }

  3. hey Bob Smith,
    your regx rocks.

    ( var PhoneEx = /^((\+\d{1,3}(-| )?\(?\d\)?(-| )?\d{1,5})|(\(?\d{2,6}\)?))(-| )?(\d{3,4})(-| )?(\d{4})(( x| ext)\d{1,5}){0,1}$/;)

    i was in search of this regx from long time

    thanks

  4. Hi guys,
    I am stucked while using regex to validate international phone number
    e.g +91(011)-123456789

    Cud u pls help me out ?????????

  5. Thanks for the regx.

    I have a small problem. Please address it.

    \(?\b([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})\b
    works well for a long string/text.

    But ^(?:\+?1[-. ]?)?\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$
    for leading optional 1 is not working for long string/text.

    How can it be modified to support long string/text.

  6. The UK uses a range of number lengths and formats, hence validation is quite complex.

    Here, the NSN is National Significant Number. This is all of the digits after the 0 trunk code or +44 country code.

    This list covers most of the options for the UK (listed in national format):

    7 digit NSNs

    0800 1111
    0845 46 47

    9 digit NSNs

    (016977) 2xxx
    (016977) 3xxx
    (01xxx) xxxxx
    0500 xxxxxx
    0800 xxxxxx

    10 digit NSNs

    (013873) xxxxx
    (015242) xxxxx
    (015394) xxxxx
    (015395) xxxxx
    (015396) xxxxx
    (016973) xxxxx
    (016974) xxxxx
    (016977) xxxxx
    (017683) xxxxx
    (017684) xxxxx
    (017687) xxxxx
    (019467) xxxxx
    (011x) xxx xxxx
    (01×1) xxx xxxx
    (01xxx) xxxxxx
    (02x) xxxx xxxx
    03xx xxx xxxx
    055 xxxx xxxx
    056 xxxx xxxx
    070 xxxx xxxx
    07624 xxxxxx
    076 xxxx xxxx
    07xxx xxxxxx
    08xx xxx xxxx
    09xx xxx xxxx

    Valid formats for geographic numbers include 2+8, 3+7, 4+6, 4+5, 5+5 and 5+4 (and 0+10 for NDO numbers).

    Non-geographic numbers mostly use 0+10 format, but some 0800 numbers and all 0500 numbers use 0+9 format.

    Most code found on the web caters for only a few of these, not the full set.

    The international format adds +44 and a space before the NSN digits.

    The national format adds the 0 trunk code before the NSN digits. For 01 and 02 numbers the area code should be in parentheses, except for NDO numbers (NDO numbers are those where the subscriber number begins 0 or 1).

    Hope that helps!

  7. Her are three working javascript regex’s for UK phone numbers. But I cannot get ANY of them to work on PHP – any gurus out there?
    telNoFormat1 = ‘/(^[0][1-9]([0-9]{1,4})\s[0-9\s]+$)’ – works with
    01234 5678900

    telNoFormat2 = ‘/(^[0][0]\s[1-9][0-9]{0,2}\s[0-9\s]+$)’;this is the international format – works with 44 91 1234 567890

    telNoFormat3 = ‘/(^[0][1-9]([0-9]{2,4})\s[0-9]{4,6}$)’;///this is the mobile no format – works with 01234 123456

  8. You open with =’/(^ so you need to close with $)/’; too. There’s a slash missing near the end.

    You should also check the number length: without the leading +44 or 0 you should have exactly 9 or 10 digits (see the list in previous post).

    01234 5678900 – has too many digits to be a valid UK number.

    44 91 1234 567890 – with 91 in it, this is also not valid. Additionally, it should begin +44.

  9. Hi All

    I have a phone number validator alredy written.
    But when i try to enter 9990005555

    This is not saved and even doesnt prompt for invalid.
    Can you please suggest.
    Thanks.

  10. this site is very helpful.

    i’m wondering, given a phone number like this: “+1234567890”; is there a mathematical way to determine the country and area codes? — without looking at the code tables.

    what if there exist country codes “1” , 12″ and “123” ? how do we figure this out?

    Thanks.

  11. This works awesome for 10 digit phone numbers:

    var regexObj = /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/;

    if (regexObj.test(subjectString)) {
    var formattedPhoneNumber =
    subjectString.replace(regexObj, “($1) $2-$3”);
    }

    What needs to change to allow an extension?

    For example, I’d like it to format the phone number to:

    (123)456-7890 x123

    Thanks, in advance for this!

  12. var regexObj = /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/;

    if (regexObj.test(subjectString)) {
    var formattedPhoneNumber =
    subjectString.replace(regexObj, “($1) $2-$3?);
    }

    not working i want vb.net fp spread

  13. HI,
    In my webpage i have one TextBox Control.
    In that textbox when i enter any three digits automatically i have to display an hyphen (“-”)’ after the three digits
    E.G:333-333-333-333-333
    HOw to write code in Javascript
    please help me
    Thanks,
    karthik

  14. I am looking for regular expression for eight digit telephone number starting with 2 or 9, has no characters in between. Examples:

    99246817
    24765579

    Can I get some help? Thanks!

  15. THis is way overkill man… just give the basic info needed to do it instead of writing a book. Although helpful to advanced users this is way overwhelming for me….

  16. I am looking mobile or phone number with all expression
    Ex: 9876543210, (987)-6543210, +91-9876543210, (+91)-(987)-6543210

    kindly help me

  17. I have recently updated a page listing the full RegEx for validating and formatting UK telephone numbers. It’s too long to reproduce here and it would be difficult to maintain multiple copies.

    The list can be found at:
    http://www.aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_Formatting_UK_Telephone_Numbers

    I’m also the UK metadata editor for the Google libphonenumber project over at: http://code.google.com/p/libphonenumber/
    The xml metadata file there has number length, validation and formatting information for every country.

  18. Regex-syntax is such a pain in my head. At least with this blog I can see real text and character patterns. Although I have to read this write-up again and again, sure it does help for my validation problems especially with international phone numbers. This makes the use of regular expression easy.

  19. Thanks SO MUCH for this post. It helped me a great deal. In case anyone else needs to get this working on iOS, the working NSString for the North Am regex is
    NSString *NorthAmRegex = @”^(?:\\+?1[-. ]?)?\\(?([2-9][0-8][0-9])\\)?[-. ]?([2-9][0-9]{2})[-. ]?([0-9]{4})$”;

    (You have to double escape the backslashes)

  20. Thanks for the reference. I like how everyone above expects you to write their regex for them… makes me wonder if they actually understood the article.

    We are only using a regex temporarily until we get some international support in place to narrow down locales.

  21. This quote from the original blog post made me curious:
    “The shortest international phone numbers in use contain seven digits”

    What country has the shortest international phone numbers?

  22. Hey..
    I have problem with this expresions:
    051123456
    041-765-432
    (031) 246-357
    040/456-123
    064.111-222
    (051)121212
    +386 041 100-200
    00 386 (0)70 555 555
    (051) 951-159
    041234567
    040 555-999
    +386 (0)70 111 222
    040/555999
    031 98 76 54

    my solution is:
    ^(?:00\s+)?[+( )]?(?:386 |s*)\(?0\)?(\d{2})[)./-]?\s*(\d{3})[./-]?\s*(\d{3})$

    but it doesnt work in case: 031 98 76 54..
    anyone who can solve this and help me..
    tnx to all:D

  23. Could you please advice how to change you regex

    ^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$

    to include phone numbers with extension, for example:
    416-564-9874,23
    416-564-9874,234
    416-564-9874,2345

    Thank you.

  24. On validating international numbers:

    Why ‘(?:[0-9] ?)’ and not just ‘[0-9 ]’?

Leave a Reply

Your email address will not be published. Required fields are marked *