Data Validation - OWASP
This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit
Data Validation
From OWASP
Jump to:
Development Guide Table of Contents
Objective
Platforms Affected
Relevant COBIT Topics
Description
Definitions
Where to include integrity checks
Where to include validation
Where to include business rule validation
8.1
Example - Scenario
8.2
Wrong way
8.3
Why this is bad
8.4
Acceptable Method
8.5
Best Method
8.6
Conclusion
Data Validation Strategies
9.1
Accept known good
9.2
Reject known bad
9.3
Sanitize
9.3.1
Sanitize with Whitelist
9.3.2
Sanitize with Blacklist
9.4
No validation
10
Prevent parameter tampering
11
Hidden fields
12
ASP.NET Viewstate
12.1
How to determine if you are vulnerable
12.2
How to protect yourself
12.3
Selects, radio buttons, and checkboxes
12.4
Per-User Data
13
URL encoding
14
HTML encoding
15
Encoded strings
16
Data Validation and Interpreter Injection
17
Delimiter and special characters
18
Further Reading
Objective
To ensure that the application is robust against all forms of input data, whether obtained from the user, infrastructure, external entities or database systems.
Platforms Affected
All.
Relevant COBIT Topics
DS11 – Manage Data. All sections should be reviewed
Description
The most common web application security weakness is the failure to properly validate input from the client or environment. This weakness leads to almost all of the major vulnerabilities in applications, such as
Interpreter Injection
, locale/Unicode attacks, file system attacks and buffer overflows. Data from the client should never be trusted for the client has every possibility to tamper with the data.
In many cases,
Encoding
has the potential to defuse attacks that rely on lack of input validation. For example, if you use HTML entity encoding on user input before it is sent to a browser, it will prevent most
XSS
attacks. However, simply preventing attacks is not enough - you must perform
Intrusion Detection
in your applications. Otherwise, you are allowing attackers to repeatedly attack your application until they find a vulnerability that you haven't protected against. Detecting attempts to find these weaknesses is a critical protection mechanism.
Definitions
These definitions are used within this document:
Integrity checks
Ensure that the data has not been tampered with and is the same as before
Validation
Ensure that the data is strongly typed, correct syntax, within length boundaries, contains only permitted characters, or that numbers are correctly signed and within range boundaries
Business rules
Ensure that data is not only validated, but business rule correct. For example, interest rates fall within permitted boundaries.
Some documentation and references interchangeably use the various meanings, which is very confusing to all concerned. This confusion directly causes continuing financial loss to the organization.
Where to include integrity checks
Integrity checks must be included wherever data passes from a trusted to a less trusted boundary, such as from the application to the user's browser in a hidden field, or to a third party payment gateway, such as a transaction ID used internally upon return.
The type of integrity control (checksum, HMAC, encryption, digital signature) should be directly related to the risk of the data transiting the trust boundary.
Where to include validation
Validation must be performed on every tier. However, validation should be performed as per the function of the server executing the code. For example, the web / presentation tier should validate for web related issues, persistence layers should validate for persistence issues such as SQL / HQL injection, directory lookups should check for LDAP injection, and so on.
Where to include business rule validation
Business rules are known during design, and they influence implementation. However, there are bad, good and "best" approaches. Often the best approach is the simplest in terms of code.
Example - Scenario
You are to populate a list with accounts provided by the back-end system
The user will choose an account, choose a biller, and press next
Wrong way
The account select option is read directly and provided in a message back to the backend system without validating the account number if one of the accounts provided by the backend system.
Why this is bad
An attacker can change the HTML in any way they choose:
The lack of validation requires a round-trip to the backend to provide an error message that the front end code could easily have eliminated
The back end may not be able to cope with the data payload the front-end code could have easily eliminated. For example, buffer overflows, XML injection, or similar.
Acceptable Method
The account select option parameter ("payee_id") is read by the code, and compared to an already-known list.
if (account.hasPayee( session.getParameter("payee_id") )) {
backend.performTransfer( session.getParameter("payee_id") );
This prevents parameter tampering, but requires the list of possible payee_id's to be to be calculated beforehand.
Best Method
The original code emitted indexes

Pattern p = Pattern.compile("javascript", CASE_INSENSITIVE);

p.matcher(input);

return (!p.matches()) ? input : '';

It can take upwards of 90 regular expressions (see the CSS Cheat Sheet in the Development Guide 2.0) to eliminate known malicious software, and each regex needs to be run over every field. Obviously, this is slow and not secure. Just rejecting "current known bad" (which is at the time of writing hundreds of strings and literally millions of combinations) is insufficient if the input is a string. This strategy is directly akin to anti-virus pattern updates. Unless the business will allow updating "bad" regexes on a daily basis and support someone to research new attacks regularly, this approach will be obviated before long.
Sanitize
Rather than accept or reject input, another option is to change the user input into an acceptable format
Sanitize with Whitelist
Any characters which are not part of an approved list can be removed, encoded or replaced.
Here are some examples:
If you expect a phone number, you can strip out all non-digit characters. Thus, "(555)123-1234", "555.123.1234", and "555\";DROP TABLE USER;--123.1234" all convert to 5551231234. Note that you should proceed to validate the resulting numbers as well. As you see, this is not only beneficial for security, but it also allows you to accept and use a wider range of valid user input.
If you want text from a user comment form, it is difficult to decide on a legitimate set of characters because nearly every character has a legitimate use. One solution is to replace all non alphanumeric characters with an encoded version, so "I like your web page", might emerge from your sanitation routines as "I+like+your+web+page%21". (This example uses
URL encoding
.)
You can also go one step further. Say you want to set up a site where users can upload arbitrary files so they can share them or download them again from another location. In this case validation is impossible because there is no valid or invalid content. Because your only concern is protecting your app from malicious input and you don't need to actually do anything except accept, store and transmit the file, you can encode the entire file in, say
base 64
Sanitize with Blacklist
Eliminate or translate characters (such as to HTML entities or to remove quotes) in an effort to make the input "safe".
Like blacklists, this approach requires maintenance and is usually incomplete. As most fields have a particular grammar, it is simpler, faster, and more secure to simply validate a single correct positive test than to try to include complex and slow sanitization routines for all current and future attacks.
public String quoteApostrophe(String input) {
if (input != null)
return input.replaceAll("[\']", "’");
else
return null;
No validation
This is inherently unsafe and strongly discouraged. The business must sign off each and every example of no validation as the lack of validation usually leads to direct obviation of application, host and network security controls.
account.setAcctId(getParameter('formAcctNo'));
...

public setAcctId(String acctId) {
cAcctId = acctId;
Prevent parameter tampering
There are many input sources:
HTTP headers, such as REMOTE_ADDR, PROXY_VIA or similar
Environment variables, such as getenv() or via server properties
All GET, POST and Cookie data
This includes supposedly tamper resistant fields such as radio buttons, drop downs, etc - any client side HTML can be re-written to suit the attacker
Configuration data (mistakes happen :))
External systems (via any form of input mechanism, such as XML input, RMI, web services, etc)
All of these data sources supply untrusted input. Data received from untrusted data sources must be properly checked before first use.
Hidden fields
Hidden fields are a simple way to avoid storing state on the server. Their use is particularly prevalent in "wizard-style" multi-page forms. However, their use exposes the inner workings of your application, and exposes data to trivial tampering, replay, and validation attacks. In general, only use hidden fields for page sequence.
If you have to use hidden fields, there are some rules:
Secrets, such as passwords, should never be sent in the clear
Hidden fields need to have integrity checks and preferably encrypted using non-constant initialization vectors (i.e. different users at different times have different yet cryptographically strong random IVs)
Encrypted hidden fields must be robust against replay attacks, which means some form of temporal keying
Data sent to the user must be validated on the server once the last page has been received, even if it has been previously validated on the server - this helps reduce the risk from replay attacks.
The preferred integrity control should be at least a HMAC using SHA-256 or preferably digitally signed or encrypted using PGP. IBMJCE supports SHA-256, but PGP JCE support require the inclusion of the Legion of the Bouncy Castle (
) JCE classes.
It is simpler to store this data temporarily in the session object. Using the session object is the safest option as data is never visible to the user, requires (far) less code, nearly no CPU, disk or I/O utilization, less memory (particularly on large multi-page forms), and less network consumption.
In the case of the session object being backed by a database, large session objects may become too large for the inbuilt handler. In this case, the recommended strategy is to store the validated data in the database, but mark the transaction as "incomplete." Each page will update the incomplete transaction until it is ready for submission. This minimizes the database load, session size, and activity between the users whilst remaining tamperproof.
Code containing hidden fields should be rejected during code reviews.
ASP.NET Viewstate
ASP.NET sends form data back to the client in a hidden “Viewstate” field. Despite looking forbidding, this “encryption” is simply plain-text equivalent (base64 encoding) and has no data integrity without further action on your behalf in ASP.NET 1.0. In ASP.NET 1.1 and 2.0, tamper proofing, called "enableViewStateMAC" is on by default using a SHA-1 hash.
Any application framework with a similar mechanism might be at fault – you should investigate your application framework’s support for sending data back to the user. Preferably it should not round trip.
How to determine if you are vulnerable
These configurations are set hierarchically in the .NET framework. The machine.config file contains the global configuration; each web directory may contain a web.config file further specifying or overriding configuration; each page may contain @page directives specifying same configuration or overrides; you must check all three locations:
If the enableViewStateMac is not set to “true”, you are at risk if your viewstate contains authorization state
If the viewStateEncryptionMode is not set to “always”, you are at risk if your viewstate contains secrets such as credentials
If you share a host with many other customers, you all share the same machine key by default in ASP.NET 1.1. In ASP.NET 2.0, it is possible to configure unique viewstate keys per application
How to protect yourself
If your application relies on data returning from the viewstate without being tampered with, you should turn on viewstate integrity checks at the least, and strongly consider:
Encrypt viewstate if any of the data is application sensitive
Upgrade to the latest version of ASP.NET as soon as practical
Move truly sensitive viewstate data to the session variable instead
Selects, radio buttons, and checkboxes
It is a commonly held belief that the value settings for these items cannot be easily tampered. This is wrong. In the following example, actual account numbers are used, which can lead to compromise:


This produces (for example):
Gold Card

Platinum Card
If the value is retrieved and then used directly in a SQL query, an interesting form of SQL injection may occur: authorization tampering leading to information disclosure. As the connection pool connects to the database using a single user, it may be possible to see other users' accounts if the SQL looks something like this:
String acctNo = getParameter('acctNo');

String sql = "SELECT acctBal FROM accounts WHERE acctNo = '?'";

PreparedStatement st = conn.prepareStatement(sql);

st.setString(1, acctNo);

ResultSet rs = st.executeQuery();
This should be re-written to retrieve the account number via index, and include the client's unique ID to ensure that other valid account numbers are exposed:
This approach requires rendering input values from 1 to ... x, and assuming accounts are stored in a Collection which can be iterated using logic:iterate:



The code will emit HTML with the values "1" .. "x" as per the collection's content.
Gold Credit Card

Platinum Credit Card
This approach should be used for any input type that allows a value to be set: radio buttons, checkboxes, and particularly select / option lists.
Per-User Data
In fully normalized databases, the aim is to minimize the amount of repeated data. However, some data is inferred. For example, users can see messages that are stored in a messages table. Some messages are private to the user. However, in a fully normalized database, the list of message IDs are kept within another table:
+------------------------+
| MESSAGES |
+------------------------+
| msgid | message |
+------------------------+
If a user marks a message for deletion, the usual way is to recover the message ID from the user, and delete that:
DELETE FROM message WHERE msgid='frmMsgId'
However, how do you know if the user is eligible to delete that message ID? Such tables need to be denormalized slightly to include a user ID or make it easy to perform a single query to delete the message safely. For example, by adding back an (optional) uid column, the delete is now made reasonably safe:
DELETE FROM message WHERE uid='session.myUserID' and msgid='frmMsgId';
Where the data is potentially both a private resource and a public resource (for example, in the secure message service, broadcast messages are just a special type of private message), additional precautions need to be taken to prevent users from deleting public resources without authorization. This can be done using role based checks, as well as using SQL statements to discriminate by message type:
DELETE FROM message
WHERE
uid='session.myUserID' AND
msgid='frmMsgId' AND
broadcastFlag = false;
URL encoding
Data sent via the URL, which is strongly discouraged, should be URL encoded and decoded. This reduces the likelihood of cross-site scripting attacks from working.
In general, do not send data via GET request unless for navigational purposes.
HTML encoding
Data sent to the user needs to be safe for the user to view. This can be done using and friends. Do not use <%=var%> unless it is used to supply an argument for or similar.
HTML encoding translates a range of characters into their HTML entities. For example, > becomes > This will still display as > on the user's browser, but it is a safe alternative.
Encoded strings
Some strings may be received in encoded form. It is essential to send the correct locale to the user so that the web server and application server can provide a single level of canoncalization prior to the first use.
Do not use getReader() or getInputStream() as these input methods do not decode encoded strings. If you need to use these constructs, you must decanoncalize data by hand.
Data Validation and Interpreter Injection
This section focuses on preventing injection in ColdFusion. Interpreter Injection involves manipulating application parameters to execute malicious code on the system. The most prevalent of these is SQL injection but it also includes other injection techniques, including LDAP, ORM, User Agent, XML, etc. – see
Interpreter Injection
for greater details. As a developer you should assume that all input is malicious. Before processing any input coming from a user, data source, component, or data service it should be validated for type, length, and/or range. ColdFusion includes support for Regular Expressions and CFML tags that can be used to validate input.
SQL Injection
SQL Injection
involves sending extraneous SQL queries as variables. ColdFusion provides the and tags for validating database parameters. These tags nests inside and , respectively. For dynamic SQL submitted in , use the CFSQLTYPE attribute of the to validate variables against the expected database datatype. Similarly, use the CFSQLTYPE attribute of to validate the datatypes of stored procedure parameters passed through .
You can also strengthen your systems against SQL Injection by disabling the Allowed SQL operations for individual data sources. See the
Configuration
section below for more information.
LDAP Injection
LDAP injection
is an attack used to exploit web based applications that construct LDAP statements based on user input. ColdFusion uses the tag to communicate with LDAP servers. This tag has an ACTION attribute which dictates the query performed against the LDAP. The valid values for this attribute are: add, delete, query (default), modify, and modifyDN. calls are turned into JNDI (Java Naming And Directory Interface) lookups. However, because wraps the calls, it will throw syntax errors if native JNDI code is passed to its attributes making LDAP injection more difficult.
XML Injection
Two parsers exist for XML data – SAX and DOM. ColdFusion uses DOM which reads the entire XML document into the server’s memory. This requires the administrator to restrict the size of the JVM containing ColdFusion. ColdFusion is built on Java therefore by default, entity references are expanded during parsing. To prevent unbounded entity expansion, before a string is converted to an XML DOM, filter out DOCTYPES elements.
After the DOM has been read, to reduce the risk of XML Injection use the ColdFusion XML decision functions: isXML(), isXmlAttribute(), isXmlElement(), isXmlNode(), and isXmlRoot(). The isXML() function determines if a string is well-formed XML. The other functions determine whether or not the passed parameter is a valid part of an XML document. Use the xmlValidate() function to validate external XML documents against a Document Type Definition (DTD) or XML Schema.
Event Gateway, IM, and SMS Injection
ColdFusion MX 7 enables Event Gateways, instant messaging (IM), and SMS (short message service) for interacting with external systems. Event Gateways are ColdFusion components that respond asynchronously to non-HTTP requests – e.g. instant messages, SMS text from wireless devices, etc. ColdFusion provides Lotus Sametime and XMPP (Extensible Messaging and Presence Protocol) gateways for instant messaging. It also provides an event gateway for interacting with SMS text messages.
Injection along these gateways can happen when end users (and/or systems) send malicious code to execute on the server. These gateways all utilize ColdFusion Components (CFCs) for processing. Use standard ColdFusion functions, tags, and validation techniques to protect against malicious code injection. Sanitize all input strings and do not allow un-validated code to access backend systems.
Best Practices
Use the XML functions to validate XML input.
Before performing XPath searches and transformations in ColdFusion, validate the source before executing.
Use ColdFusion validation techniques to sanitize strings passed to xmlSearch for performing XPath queries.
When performing XML transformations only use a trusted source for the XSL stylesheet.
Ensure that the memory size of the Java Sandbox containing ColdFusion can handle large XML documents without adversely affecting server resources.
Set the memory value to less than the amount of RAM on the server (-Xmx).
Remove DOCTYPE elements from the XML string before converting it to an XML object.
Using scriptProtect can be used to thwart most attempts of cross-site scripting. Set scriptProtect to All in the Application.cfc.
Use or to instantiate variables in ColdFusion. Use this tag with the name and type attributes. If the value is not of the specified type, ColdFusion returns an error.
To handle untyped variables use IsValid() to validate its value against any legal object type that ColdFusion supports.
Use and to valid dynamic SQL variables against database datatypes.
Use CFLDAP for accessing LDAP servers. Avoid allowing native JNDI calls to connect to LDAP.
Best Practice in Action
The sample code below shows a database authentication function using some of the input validation techniques discussed in this section.

SELECT hashed_password, salt

FROM UserTable

WHERE UserName =


Delimiter and special characters
There are many characters that mean something special to various programs. If you followed the advice only to accept characters that are considered good, it is very likely that only a few delimiters will catch you out.
Here are the usual suspects:
NULL (zero) %00
LF - ANSI chr(10) "\r"
CR - ANSI chr(13) "\n"
CRLF - "\n\r"
CR - EBCDIC 0x0f
Quotes " '
Commas, slashes spaces and tabs and other white space - used in CSV, tab delimited output, and other specialist formats
<> - XML and HTML tag markers, redirection characters
; & - Unix and NT file system continuance
@ - used for e-mail addresses
0xff
... more
Whenever you code to a particular technology, you should determine which characters are "special" and prevent them appearing in input, or properly escaping them.
Further Reading
ASP.NET 2.0 Viewstate
Development Guide Table of Contents
Retrieved from "
Categories
FIXME
OWASP Guide Project
Validation
Encoding
Navigation menu
Personal tools
Request account
Namespaces
Page
Discussion
Variants
Views
Read
View source
View history
More
About OWASP
Acknowledgements
Advertising
Books
Brand Resources
Careers
Chapters
Downloads
Events
Funding
Governance
Initiatives
Mailing Lists
Merchandise
Presentations
Press
Projects
Supporting Partners
Video
Reference
Activities
Attacks
Code Snippets
Controls
Glossary
How To...
Java Project
.NET Project
Principles
Technologies
Threat Agents
Vulnerabilities
Tools
What links here
Related changes
Special pages
Printable version
Permanent link
Page information
This page was last modified on 2 December 2013, at 04:13.
Content is available under
Creative Commons Attribution-ShareAlike
unless otherwise noted.
About OWASP
Disclaimers
Open Web Application Security Project, OWASP, Global AppSec, AppSec Days, AppSec California, SnowFROC, LASCON, and the OWASP logo are trademarks of the OWASP Foundation.