mailing list archives
Whitelist vs. Blacklist input validation (Was Re: IBM Websphere Commerce Server 5.5 XSS detect mode)
From: The Crocodile <tcroc () pasture com>
Date: Wed, 12 May 2004 07:57:47 -0400
My intent was not to indicate that validation of input data is all that
needs to be done to gaurentee the security of client supplied data. My
intent was to demonstrate the inadequacies with "blacklist" style
filtering. Again it is a great first step, and absolutly doesn't hurt to
do, however it does not make for best practices when validating client
The "whitelist" filter I outlined would be for a phone number in the
united states. If you knew that the particular input field would be
supporting numbers from other countries then an appropriate "whilelist"
would be created (ie. allow +()- ). The accepted list of characters is
completely dependant upon the particular expected values.
I didn't intend for this to be the end all be all of input security.
This by no means checks for authentication and authorization within the
parameter being passed, nor does it check for business logic errors.
That is left for other routines to check. This is simply a better way of
validating input for malicious characters.
As a follow on point, I feel it is much harder to get the "blacklist"
right then it would be for the "whitelist". With a whitelist you have a
finite set of characters that you must check for and will accept on a
per field basis (anything out of the norm gets bounced), whith the
blacklist model you have an infinite set of characters that you must
check for. That set of characters can not be clearly defined. It just
doesn't work as well.
On Wed, 2004-05-12 at 07:19, Paul Johnston wrote:
While I agree with you in principle, I feel it's hard to get this
"whitelist" right. In the example you give, your validation routing
would reject +44-(0)161-237-1028 which is a widely accepted way of
displaying an international phone number.
Another issue with this approach is that it's easy to then think "ok
this has been sanitised, so now it's trusted" which is a fallacy. For
example, you take user_id from the client. It's all digits so it passes
sanitisation. However, if you now trust this variable then a user could
tamper with the parameters and impersonate another.
I'm not against sanitisation, but this idea of "only allow good chars
and all will be fine" is overly optimistic.
The Crocodile wrote:
While I'm sure this is a great technique to do, and certainly a step in
the right direction for many applications, the better way to validate
data supplied by the client side would be to compare the input against a
known set of GOOD data and if there are characters that are not in this
set of known good, then reject the request.
For example: An input field of a phone number should only accept numbers
and dashes, it should not accept any other characters and should reject
on any input that contains anything other than numbers or dashes. (or at
least give an error to the user). Validation routines ideally should be
done on a per field basis.
Rejecting data based on it containing certain known bad characters is
like firewalls listing all the things that need to be dropped and
accepting everything else. It's not really best practice.
I hope this makes sense.
On Mon, 2004-05-10 at 22:37, Jim+Lisa Weiler wrote:
IBM Websphere Commerce server 5.5 has a switch that causes the server to
examine all fields in POSTs and all variables in GETs and check the input
against a set of strings and characters that are not allowed, and return one
of 3 customs web pages if non allowed strings or characters are found. Does
anyone have experience with this feature in Websphere Commerce Server?