Special Characters In Passwords

In this day and age of technology, we all need access to our protected data. In order to do that, you need a good password. What if you're not allowed to create a good password?

Too often, I see web forms that try to be clever in that they require you to have a uppercase, lowercase and a number in order to make your password stronger and harder to guess. Then they turn around and deny you the use of special characters. That's like walking into a bar, and the minimum is 3 drinks and you think you're going to have a good time, until you find out they watered the beer down to >1.5% - Good job!

Other instances, I see web forms that totally DENY you the use of anything except alphanumeric characters. I've never seen an instance of a web form that does this legitimately. The only excuse for this is because the developer who created the form didn't properly escape the data before printing it to the next stream.

This is ridiculous! The best thing you can do for your users is to validate on input - escape on output!!!

Let's break this down for those who need more explaining:

Validate On Input

If you are expecting an eMail address, validate the string as an eMail address using regular expressions or some type of string matching pattern. If you are expecting a phone number, then you can expect to receive only digits, dots and slashes and maybe an X to note the extension. If you want to be uniform with your data, break the components into multiple inputs and validate each input. Same rules apply for IP addresses, MAC addresses, name, address, and any other type of data you may accept.

Escape On Output

Before printing any user input to any stream, you have to properly escape that data. The art of escaping comes with the driver you're using to print to the stream.


For example, when printing to a SQL query - you'll want to use your programming language's database driver's function to escape the data for the SQL stream. Alternatively, you can use prepared statements to create queries, and use placeholders for the data. The DB driver will replace the placeholder with the appropriate escaped data before executing the query.


When printing to a file path, you want to ensure you're taking the basename() of the user input to prevent directory traversal. DO NOT simply remove dots and slashes as you could wind up mangling the resulting file name. Limit user input to a specific directory. If you must have organization and folders with the user data, then store the metadata in a data store of sorts. Do not allow users to create their own directory trees as it could allow for injection.


Let's assume you have some data that came from the end-user and you want to put it into a javascript function somewhere in the response. Don't just strip the quotes, use JSON encoding to properly encode the data as a JSON object and return that to the client.


When printing to the browser, most times, you can get away with the 5 main characters that define XML ( < > & " ' ), however all web-based languages should have an HTML escaping function. Additionally, you need to consider the character set a client is using. If not, all languages have access to libxml, which is a core library dealing with XML. When printing to an HTML stream, if you don't know an HTML escaping function, use XML escaping (since HTML essentially is XML). Create a DOM document element, set the value of the element to your user data, then retrieve the escaped string from the element. I know it sounds like a PITA, but I guarantee you it will CYA!

It doesn't matter which stream you're using, you must find the proper driver and function to escape the data for that stream. Most languages have these functions built right into the framework, so you don't have to cook up your own function escape(){}

Do NOT put restrictions on what the user can use for their password field. Do NOT limit the length, do NOT limit the characters (except for what is reasonable for your data store). Sure - there should be a minimum number of characters, and if you want to require the user have 1 upper, lower and number in their password - fine. It ensures there is more entropy and makes the password that much more complex and difficult to bruteforce.
That's the whole point of a password! It's a secret token that's supposed to be difficult to guess. When you limit that range from the whole alphabet at 65535 characters to just alphanumeric at 36, you increase the chances of an easy bruteforce. This is simply NOT the function of a password. Even then - we should be using "passphrases" - or a collection of words or a phrase that grants us access.

You people need to get out of the 1900s and the year 2000 and upgrade to 2013. We are the Gods of our virtual environment. To assume that it's impossible to accept a user's password as they submit it without opening a vulnerability is a testament to your incompetence to building secure web applications.