Sanitization
Sanitize with Whitelist
Any characters which are not part of an approved list can be removed, encoded or replaced.
Accept known good
This strategy is also known as "whitelist" or "positive" validation. The idea is that you should check that the data is one of a set of tightly constrained known good values. Any data that doesn't match should be rejected. Data should be:
- Strongly typed at all times
- Length checked and fields length minimized
- Range checked if a numeric
- Unsigned unless required to be signed
- Syntax or grammar should be checked prior to first use or inspection
If you expect a postcode, validate for a postcode (type, length and syntax):
public String isPostcode(String postcode) {
return (postcode != null && Pattern.matches("^(((2|8|9)\d{2})|((02|08|09)\d{2})|([1-9]\d{3}))$", postcode)) ? postcode : "";
}
Coding guidelines should use some form of visible tainting on input from the client or untrusted sources, such as third party connectors to make it obvious that the input is unsafe:
String taintPostcode = request.getParameter("postcode");
ValidationEngine validator = new ValidationEngine();
boolean isValidPostcode = validator.isPostcode(taintPostcode);
Reject known bad
This strategy, also known as "negative" or "blacklist" validation is a weak alternative to positive validation. Essentially, if you don't expect to see characters such as %3f or JavaScript or similar, reject strings containing them. This is a dangerous strategy, because the set of possible bad data is potentially infinite. Adopting this strategy means that you will have to maintain the list of "known bad" characters and patterns forever, and you will by definition have incomplete protection.
public String removeJavascript(String input)
{
Pattern p = Pattern.compile("javascript", CASE_INSENSITIVE);
p.matcher(input);
return (!p.matches()) ? input : '';
}
It can take upwards of 90 regular expressions (see the CSS Cheat Sheet in the Development Guide 2.0) to eliminate known malicious software, and each regex needs to be run over every field. Obviously, this is slow and not secure. Just rejecting "current known bad" (which is at the time of writing hundreds of strings and literally millions of combinations) is insufficient if the input is a string. This strategy is directly akin to anti-virus pattern updates. Unless the business will allow updating "bad" regexes on a daily basis and support someone to research new attacks regularly, this approach will be obviated before long.
SANITIZE
public static String [] sanitizedData ( String... input )
{
String sanitizedData[] = new String[input.length];
int index = 0;
for ( String i : input )
{
sanitizedData[index++] = i.replaceAll ( "[\'~!@#$%^&*()\";: <>?/,.]", "’" );
}
return sanitizedData;
}
public static String [] sanitizedDatawithQuote ( String... input )
{
String sanitizedData[] = new String[input.length];
int index = 0;
for ( String i : input )
{
i = i.replaceAll ( "<", "<" );
i = i.replaceAll ( ">", ">" );
//etc..
sanitizedData[index] = i;
}
return sanitizedData;
}
//calling example
/*
String sanitizedData[] = SecureAlgorithm.sanitizedData ( "vijay\'~ !@#$%^&*()\";:<>?/,.", "jessy%" );
String sanitizedDatawithQuote[] = SecureAlgorithm.sanitizedDatawithQuote ( "vijay<>" );
System.out.println ( sanitizedData[0] + "\n" + sanitizedData[1] );
System.out.println ( sanitizedDatawithQuote[0] );
String taintPostcode = request.getParameter("postcode");
ValidationEngine validator = new ValidationEngine();
boolean isValidPostcode = validator.isPostcode(taintPostcode);
*/