I found myself using tools a lot more in my day to day job this year than I am used to. It is amazing how far some of the software breaking software tools have really come in the last few years. So that being said, I am going to focus on some of the tools that I have found myself using regularly.
Wireshark
Tried and true, Wireshark has been there for me, through thick and thin, since I got into this game long ago. I have used Wireshark for everything from dissecting RIP packets to troubleshooting TLS/SSL connection issues with SOAP Web Services. This tool is an absolute MUST HAVE for anyone in the security field. Be they hacker, hobbyist, or analyst - Wireshark will be one of your best friends.
Cain and Abel
It's interesting - I actually have a Windows XP VM that exists for no other purpose than to run Cain & Abel. This tool is invaluable when you need to capture network packets and have them analyzed and organized in real-time. It is great for sniffing out cleartext credentials (Telnet, FTP, SMTP), and one of the absolute best features is the ability to capture and essentially *bug* VOIP calls.
Metasploit
I remember when I first heard about MSF. I thought to myself, great, a new tool for Skiddies to annoy me with. At first glance it is nothing more than a glorified over-powered version of BO95. However, when you really dig deep under the covers, MSF is more like an IDE (that's Integrated Development Environment for you non-developer types) built specifically for hackers! It is a self-contained runtime where you can discover vulnerabilities and test exploits and shellcode all in a single environment. Really, this is an ingenius tool that's power far exceeds what most people are using it for.
JBroFuzzer
This has become one of the most used tools in my toolkit - I use JBroFuzzer against any new development in any of the applications that I maintain to test Regex patterns, request parameters, cookie values, and more than anything else, RESTful web services. It is amazing how many ways there are to break webservices and with some custom payloads and JBroFuzz you can create some very powerful scans that can be run against your app to dig out even the most obscure bugs that most ordinary testers would never think of.
JMeter
The quickest way to DoS your own application is by firing up a JMeter test. This is an extremely powerful tool from the Apache team that is written in Java. Load Testing, automated crawl tests, whitebox testing, and much, much more are possible with this extremely easy to use and powerful tool.
BurpSuite
The great thing about BS is it is an entire suite of tools that work together and allow you to run a quick (and surprisingly accurate) audit of an entire application. The interface is simple to use and most importantly, it is extendable. It is Java based so it is platform agnostic and works with any browser. One of the cooler features is the ability to suspend and save state then restore and continue later. Great stealth scanning feature.
There are other tools that I find myself using, but I use these almost every day, and absolutely use each extensively when doing an audit. Take this list for what it's worth, but if you haven't tried a couple of these, do yourself a favor and give them a shot.
12.28.2009
12.20.2009
Almost There - Sealed Objects in JavaScript
The last blog entry I wrote last week was a huge call out to anyone that could help me find the solution to the problem of writing secure objects is JavaScript. Unfortunately, the call went unanswered so I set out over the weekend to see if I could find the answer, and well, I haven't completely found the solution yet - but I am close.
Let me jump right into it:
Step 1 - We need to implement a "watch" function for browsers that don't support it
Step 2 - Now that we have that in place, we can create our seal method
That is it. This solution is currently working in FF3.5 and Chrome for Linux, I haven't gotten it to work at all in IE7 or 8 yet (Once again, thank you Microsoft) but I suspect once I throw this over the fence to some of the hardcore IE/Compatibility gurus at the office tomorrow, I will have the answer to that problem.
If you would like to see a working demo of this in action, you can checkout the demo page I put together for it at http://software.digital-ritual.net/js-secure-demo/.
I absolutely welcome any and all comments, and will test this out on more browsers as time permits tomorrow. Hopefully this could solve the problem, at least provided something doesn't stop the JavaScript RTE from running the Object.seal method on all objects that need to be sealed; but that is for load testing. For now this seems to be the best approach I can come up with.
Let me jump right into it:
Step 1 - We need to implement a "watch" function for browsers that don't support it
Object.prototype.watch = function (prop, handler) { var oldval = this[prop], newval = oldval, getter = function () { return newval; }, setter = function (val) { oldval = newval; return newval = handler.call(this, prop, oldval, val); }; if (delete this[prop]) { // can't watch constants if (Object.defineProperty) // ECMAScript 5 Object.defineProperty(this, prop, { get: getter, set: setter }); else if (Object.prototype.__defineGetter__ && Object.prototype.__defineSetter__) { // legacy Object.prototype.__defineGetter__.call(this, prop, getter); Object.prototype.__defineSetter__.call(this, prop, setter); } } };
Step 2 - Now that we have that in place, we can create our seal method
Object.prototype.seal = function() { for ( var e in this ) { this.watch( e, function(newVal,oldVal) { var caller = arguments.callee.caller; throw 'Attempt was name to alter a sealed object [' + this + '] from [' + oldVal + '] to [' + newVal + ']' + 'from [' + caller + ']'; }); }; };VIOLA!
That is it. This solution is currently working in FF3.5 and Chrome for Linux, I haven't gotten it to work at all in IE7 or 8 yet (Once again, thank you Microsoft) but I suspect once I throw this over the fence to some of the hardcore IE/Compatibility gurus at the office tomorrow, I will have the answer to that problem.
If you would like to see a working demo of this in action, you can checkout the demo page I put together for it at http://software.digital-ritual.net/js-secure-demo/.
I absolutely welcome any and all comments, and will test this out on more browsers as time permits tomorrow. Hopefully this could solve the problem, at least provided something doesn't stop the JavaScript RTE from running the Object.seal method on all objects that need to be sealed; but that is for load testing. For now this seems to be the best approach I can come up with.
Labels:
application security,
ESAPI,
javascript
12.15.2009
JavaScript? Hello? Is anyone there?
So I am still in the architectural design phase of ESAPI4JS, and have come across an interesting problem. Well, to be honest, there are lots of problems with JavaScript and trying to make it secure - but there is one that I have yet to find a way to overcome. This, in my opinion, is one of the biggest shortcomings of JavaScript that I have encountered to date.
On with the problem...
Say that you have a class that you are creating with JavaScript. You want to ensure that the "private" methods and properties of that class cannot be altered by code running in the same window.
This is a relatively trivial thing to do in javascript using closures:
Using a closure like this creates an interesting scoping situation where the property _myPrivate is locally scoped to the constructor and the 'Object' being returned by the function thus privatizing it both from the global scope and the scope of the closure itself. This scoping situation is called Closure Scope.
This is all fine and great, however, the top level globally scoped reference is always vulnerable to be altered. For instance, we know now that calling getPrivate() on the instantiated closure object will return the innermost value which happens to be "Private", but what happens if I alter the outermost globally accessible reference like this:
We have effectively subverted the entire point of the closure by substituting what the outermost function actually does (coincedentally, the references to anything Closure Scoped will disappear in this case as well, and will be ultimately GC'd, effectively removing the data from the Runtime entirely)
The one thing that is truly holding JavaScript back from being a language that has any hope in a secure world is the fact that there is NO RELIABLE way to SEAL an object. In Java, you can make a class final to ensure that the class cannot be extended to alter it's behavior, you can make properties and methods private to ensure that their behavior and values cannot be altered. There are similar strategies in just about every language that provides OOP or OOP-Like functionality (even PHP!).
I have come up with some somewhat hacky ways to accomplish this, but they can all be subverted by other code running the same runtime. I have even thought about creating a trusted JS plugin for browsers that basically keeps the JavaScript inside a sealed/signed jar after it has been downloaded (so processes outside of the browser cannot modify the cached javascript) and having the initial javascript source tell the plugin which object should be sealed, then have the plugins use the underlying C API to ensure that they cannot be changed. This is a huge undertaking and frankly is open to a ton of potential attacks.
The idea behind the ESAPI4JS library is good - basically providing end users with protection from server side code that has been compromised as a second level of defense. This is something that I think will become increasingly important as time goes on, but without the ability to create secure JS code that cannot be altered, it simply won't live up to it's full potential.
It will provide a second level of defense that can ultimately in most cases still protect the end user, afterall, the code to subvert the server-side code will be the main focus of the nefarious individual doing so, and they may not even realize that there is another layer of protection against potential front-end victims that also needs to be subverted.
I still think it is a great idea, and am excited to see where it goes, but I throw this out to the JavaScript community as both a challenge and a plea. This is something that JavaScript sorely needs (I am sure most JS library writers will agree) and the only way that we will ever see it is if the community demands it.
So Testify! To Securify!
Onward!
On with the problem...
Say that you have a class that you are creating with JavaScript. You want to ensure that the "private" methods and properties of that class cannot be altered by code running in the same window.
This is a relatively trivial thing to do in javascript using closures:
var MyUnchangeableClass = function(){ var MyUnchangeableClassImpl = function() { var _myPrivate = "Private"; return { getPrivate: function() { return _myPrivate; } }; }; var _singletonInstance = new MyUnchangeableClass(); return { getPrivate: function() { return _singletonInstance.getPrivate(); } }; } var unchangeableClass = new MyUnchangeableClass(); alert( unchangeableClass.getPrivate() ); // alerts 'Private'
Using a closure like this creates an interesting scoping situation where the property _myPrivate is locally scoped to the constructor and the 'Object' being returned by the function thus privatizing it both from the global scope and the scope of the closure itself. This scoping situation is called Closure Scope.
This is all fine and great, however, the top level globally scoped reference is always vulnerable to be altered. For instance, we know now that calling getPrivate() on the instantiated closure object will return the innermost value which happens to be "Private", but what happens if I alter the outermost globally accessible reference like this:
unchangeableClass.getPrivate = function() { return "Overwritten"; }; alert( unchangeableClass.getPrivate() ); // alerts 'Overwritten'
We have effectively subverted the entire point of the closure by substituting what the outermost function actually does (coincedentally, the references to anything Closure Scoped will disappear in this case as well, and will be ultimately GC'd, effectively removing the data from the Runtime entirely)
The one thing that is truly holding JavaScript back from being a language that has any hope in a secure world is the fact that there is NO RELIABLE way to SEAL an object. In Java, you can make a class final to ensure that the class cannot be extended to alter it's behavior, you can make properties and methods private to ensure that their behavior and values cannot be altered. There are similar strategies in just about every language that provides OOP or OOP-Like functionality (even PHP!).
I have come up with some somewhat hacky ways to accomplish this, but they can all be subverted by other code running the same runtime. I have even thought about creating a trusted JS plugin for browsers that basically keeps the JavaScript inside a sealed/signed jar after it has been downloaded (so processes outside of the browser cannot modify the cached javascript) and having the initial javascript source tell the plugin which object should be sealed, then have the plugins use the underlying C API to ensure that they cannot be changed. This is a huge undertaking and frankly is open to a ton of potential attacks.
The idea behind the ESAPI4JS library is good - basically providing end users with protection from server side code that has been compromised as a second level of defense. This is something that I think will become increasingly important as time goes on, but without the ability to create secure JS code that cannot be altered, it simply won't live up to it's full potential.
It will provide a second level of defense that can ultimately in most cases still protect the end user, afterall, the code to subvert the server-side code will be the main focus of the nefarious individual doing so, and they may not even realize that there is another layer of protection against potential front-end victims that also needs to be subverted.
I still think it is a great idea, and am excited to see where it goes, but I throw this out to the JavaScript community as both a challenge and a plea. This is something that JavaScript sorely needs (I am sure most JS library writers will agree) and the only way that we will ever see it is if the community demands it.
So Testify! To Securify!
Onward!
Labels:
application security,
ESAPI,
javascript
12.14.2009
GET vs POST in Java Servlets...
This is an issue that has come up many times before, and something that really grates on my nerves as a developer and makes the appsec part of me angry. If you have developed Servlets in Java you may or may not be aware of a design issue in the way HTTP Requests are processed. Here is the issue.
So what is the issue here? It is quite simple really, if you POST to a Servlet it should ONLY return the value(s) for the parameter that were part of the POST Request, ignoring the GET values! The same is true for the opposite.
Even PHP has gotten this right by seperating parameters out into the $_POST and $_GET globals (the use of globals here is a whole seperate issue)
So why is this a big issue? Well for one, it makes it much easier for would-be hackers to try to do mean things to your application. There are lots of reasons that this is a bad idea, but the main one is that when you are posting parameters to a servlet, a great deal of the time, you are posting operational information, which can be changed by adding a GET parameter to the URL, maybe. And that's the kicker, you really have no idea whether the parameter(s) you are looking at were passed in on the URL or were part of the POST without additional work.
I suppose there are ways around this that could be implemented into a wrapped request, but he fact of the matter is that this is something that absolutely should be part of the spec. It is no secret that a lot of people want this to be added, and frankly it really irritates me that the community has not listened to the user base in the respect.
Data from the query string and the post body are aggregated into the request parameter set. Query string data is presented before post body data. For example, if a request is made with a query string of a=hello and a post body of a=goodbye&a=world, the resulting parameter set would be ordered a=(hello, goodbye, world).
So what is the issue here? It is quite simple really, if you POST to a Servlet it should ONLY return the value(s) for the parameter that were part of the POST Request, ignoring the GET values! The same is true for the opposite.
Even PHP has gotten this right by seperating parameters out into the $_POST and $_GET globals (the use of globals here is a whole seperate issue)
So why is this a big issue? Well for one, it makes it much easier for would-be hackers to try to do mean things to your application. There are lots of reasons that this is a bad idea, but the main one is that when you are posting parameters to a servlet, a great deal of the time, you are posting operational information, which can be changed by adding a GET parameter to the URL, maybe. And that's the kicker, you really have no idea whether the parameter(s) you are looking at were passed in on the URL or were part of the POST without additional work.
I suppose there are ways around this that could be implemented into a wrapped request, but he fact of the matter is that this is something that absolutely should be part of the spec. It is no secret that a lot of people want this to be added, and frankly it really irritates me that the community has not listened to the user base in the respect.
Labels:
application security,
J2EE,
java,
wtf
12.10.2009
Invocation is E.V.I.L., Mmkay?
This post is especially for anyone using Apache Tomcat for their application server. Today we will be discussing the EVIL InvokerServlet. What is so evil about the InvokerServlet? I mean it ships with Tomcat and has for a long time?
Let's analyze that question for a second. If you're like me, you have seen a lot of sci-fi and fantasy movies. I can think of very few examples where something being invoked was a good thing. As a matter of fact it is almost always the bad guys doing the invoking and they are almost always trying to do so to undermine and take over the world, or some other nefarious purpose in line with that.
So what does that have to do with Tomcat?
Let's apply this to Java:
This is what the InvokerServlet allows you to do. By itself, in a controlled environment, it is not such a bad thing. But this is a servlet, which means, when enabled, is accesible to the entire internet, good guys and bad guys alike. All they need to do is discover the incantation, or in this case the FQN that need be uttered or passed in to make things happen.
Say for example you have a page on your site that contains a form that submits to a servlet by way of the invoker servlet. This code looks like this:
Any bad guy who has discovered the Right Click -> View Source option in every browsers context menu will be able to discover the secret incantation to invoke something which may be innocent enough. However, what this tells the attacker more than anything is that in your universe (website) he can learn all the secrets of your universe by trying different incantations.
Now, let's take this one step further and let's invoke the power of Google (God of the land of Search) to see what universes will easily allow us to start trying out some incantations.
Google Dork: inurl:servlet/com
Of course you can try this with net.*, org.*, or any other package descriptors you can think of. The sheer amount of results of people that are doing this and allowing this to happen are crazy.
So we still haven't really seen anything "dangerous" yet, you might be saying to yourself at this point. Well, actually you have. I have already demonstrated that you are exposing the internal structure of your source code by simply being forced to reference a class by it's FQN in a manner that is publicly accessible.
Things get even more interesting when you start tuning your query a little bit to get you more interesting results.
Google Dork: inurl:servlet/com +admin
Wow, that definately gives you some interesting stuff. Including the direct paths and information required to get admin access on a couple of sites, database usernames and passwords, and the aforementioned structure of your source code.
So let's keep going with this. Maybe using the invoker servlet, I can try to load a class that isn't a servlet at all, rather is just a POJO that lives in the classpath somewhere. To test this theory, let's try to load something that we know exists.
Url: http://hax0r-universe.org/servlet/java.lang.Object
Well, it tried to load and run the class as a servlet... Isn't that interesting. I wonder if what happens if I try to access something that I know isn't an object.
Url: http://hax0r-universe.org/servlet/java.lang.SecretIncantation
Hey, look at that, we got a 404 back from the server instead of the 500 we got when we tried to load a valid object that wasn't a Servlet.
Things are getting interesting now. I wonder if we could fingerprint and detect what libraries are running in the app server. Let's pick something that almost every application uses. Let's experiment with a Log4J class.
Url: http://hax0r-universe.org/servlet/org.apache.log4j.Logger
Hey! It tried to load it! Awesome!
I wonder if evilsite uses the version of <Insert Library Name Here> that had that really cool Buffer Overflow Vulnerability that let's me execute arbitrary code and get a shell to the server!
You see where this is going.
Using the invoker servlet opens your entire JVM to the world for discovery and probing. A dictionary powered crawl could provide a map of your entire source code tree. A vulnerable library could be exposed providing a mechanism for the bad guys to use to find that security hole in your otherwise inpenetrable fortress application. The possibilities here are endless.
So, why, after all of that, does the Apache Crew refuse to do away with the Invoker Servlet? I can't pretend to know the answer to that, but I can tell you that by default, there is no *explicit* mapping to the InvokerServlet (it is commented out in $TOMCAT_HOME/conf/web.xml) but there are a significant number of people that are uncommenting it to use it in their apps.
This is bad, mmkay. So let's stop the madness. If you are using the InvokerServlet, it is time to get over the lazyness of not adding explicit mappings to your servlets and get rid of it!
On a side note, I am investigating the possibility of including a SecureInvokerServlet (if such a thing could even be written) for inclusion in the ESAPI that provides the same kind of functionality but adds in the required Access Control, Path Whitelisting, and basic security controls that the standard InvokerServlet that ships with Tomcat lacks.
Update 1 - A good followup read is the Securing Tomcat page at OWASP
Let's analyze that question for a second. If you're like me, you have seen a lot of sci-fi and fantasy movies. I can think of very few examples where something being invoked was a good thing. As a matter of fact it is almost always the bad guys doing the invoking and they are almost always trying to do so to undermine and take over the world, or some other nefarious purpose in line with that.
So what does that have to do with Tomcat?
- invoke (synonym: conjure)
- to call forth by incantation
Let's apply this to Java:
- invoke (synonym: conjure)
- to call forth by FQN (Fully Qualified Name)
This is what the InvokerServlet allows you to do. By itself, in a controlled environment, it is not such a bad thing. But this is a servlet, which means, when enabled, is accesible to the entire internet, good guys and bad guys alike. All they need to do is discover the incantation, or in this case the FQN that need be uttered or passed in to make things happen.
Say for example you have a page on your site that contains a form that submits to a servlet by way of the invoker servlet. This code looks like this:
<html> <head> <title>This is the secret incantation page</title> </head> <body> <form action="/servlet/com.yoursite.servlets.DoSomethingServlet" method="POST"> <!-- Form Elements go here --> </form> </body> </html>
Any bad guy who has discovered the Right Click -> View Source option in every browsers context menu will be able to discover the secret incantation to invoke something which may be innocent enough. However, what this tells the attacker more than anything is that in your universe (website) he can learn all the secrets of your universe by trying different incantations.
Now, let's take this one step further and let's invoke the power of Google (God of the land of Search) to see what universes will easily allow us to start trying out some incantations.
Google Dork: inurl:servlet/com
Of course you can try this with net.*, org.*, or any other package descriptors you can think of. The sheer amount of results of people that are doing this and allowing this to happen are crazy.
So we still haven't really seen anything "dangerous" yet, you might be saying to yourself at this point. Well, actually you have. I have already demonstrated that you are exposing the internal structure of your source code by simply being forced to reference a class by it's FQN in a manner that is publicly accessible.
Things get even more interesting when you start tuning your query a little bit to get you more interesting results.
Google Dork: inurl:servlet/com +admin
Wow, that definately gives you some interesting stuff. Including the direct paths and information required to get admin access on a couple of sites, database usernames and passwords, and the aforementioned structure of your source code.
So let's keep going with this. Maybe using the invoker servlet, I can try to load a class that isn't a servlet at all, rather is just a POJO that lives in the classpath somewhere. To test this theory, let's try to load something that we know exists.
Url: http://hax0r-universe.org/servlet/java.lang.Object
Well, it tried to load and run the class as a servlet... Isn't that interesting. I wonder if what happens if I try to access something that I know isn't an object.
Url: http://hax0r-universe.org/servlet/java.lang.SecretIncantation
Hey, look at that, we got a 404 back from the server instead of the 500 we got when we tried to load a valid object that wasn't a Servlet.
Things are getting interesting now. I wonder if we could fingerprint and detect what libraries are running in the app server. Let's pick something that almost every application uses. Let's experiment with a Log4J class.
Url: http://hax0r-universe.org/servlet/org.apache.log4j.Logger
Hey! It tried to load it! Awesome!
I wonder if evilsite uses the version of <Insert Library Name Here> that had that really cool Buffer Overflow Vulnerability that let's me execute arbitrary code and get a shell to the server!
You see where this is going.
Using the invoker servlet opens your entire JVM to the world for discovery and probing. A dictionary powered crawl could provide a map of your entire source code tree. A vulnerable library could be exposed providing a mechanism for the bad guys to use to find that security hole in your otherwise inpenetrable fortress application. The possibilities here are endless.
So, why, after all of that, does the Apache Crew refuse to do away with the Invoker Servlet? I can't pretend to know the answer to that, but I can tell you that by default, there is no *explicit* mapping to the InvokerServlet (it is commented out in $TOMCAT_HOME/conf/web.xml) but there are a significant number of people that are uncommenting it to use it in their apps.
This is bad, mmkay. So let's stop the madness. If you are using the InvokerServlet, it is time to get over the lazyness of not adding explicit mappings to your servlets and get rid of it!
On a side note, I am investigating the possibility of including a SecureInvokerServlet (if such a thing could even be written) for inclusion in the ESAPI that provides the same kind of functionality but adds in the required Access Control, Path Whitelisting, and basic security controls that the standard InvokerServlet that ships with Tomcat lacks.
Update 1 - A good followup read is the Securing Tomcat page at OWASP
Labels:
application security,
java,
tomcat
11.05.2009
Is Role Based Access Control dead?
This question has been coming up a lot in different circles lately and it seems like there isn't a great deal of online buzz or conversation about it, so I would like to bring it to the online collective for discussion and debate.
I have contested lately that RBAC (Role Based Access Control) just flat out doesn't work in today's world. There was once upon a time, when applications were much simpler that this concept fit very well in the application security world, but we no longer live in that world.
The problem with RBAC is that it is a big hammer. It is what I like to refer to as an all-or-nothing solution to a problem that deserves a much finer grained answer. To elaborate on this, let me first explain the concept of RBAC.
1. Applications have users.
2. Users need to be be able to perform actions.
3. Actions are associated with Roles that are allowed to perform said actions.
4. Users belong to one or more Roles.
This is a very *simple* solution to the problem of access control in an application, and in simple applications, it works quite well. RBAC is widely supported by vendors and comes built in to most web application containers. It also happens to be the access control mechanism used by most OS vendors (Replace Role with Group and Voila!). There are tons of implementations for J2EE applications such as Spring ACEGI, JAAS, etc.
When you want to check if a user can perform some action you simply use a quick check
So what is the problem with this simple, widely supported, very popular means of access control? It's simple really, RBAC has no awareness of the context request for access.
To illustrate this point, consider the following situation.
You have just completed coding a wonderful project management application for Big Humungous Inc. that filled all the requirements, is unhackable, and even has a list of features far exceeding the clients requests. One day you recieve a call that your client is creating a special group of representatives that need to have administrative access to customer accounts that belong to Group A between 2pm and 3pm every Monday, Wednesday, and Friday.
Now you have a problem. A role is an all-or-nothing access mechanism. So you can't just add the users to the Administrator Role, so your only option is to go back into the code and add a new check in the code that says am I in this role, and do I meet all these other requisites to do the requested action. Then you need to implement that code in every place where you would normally ask is the user an administrator?
Now let me introduce to a different approach to Access Control. It goes by many names. Some call it Activity Based Access Control, I call it Context Based Access Control. The concept is simple.
Make your Access Control mechanism aware of context!
There are any number of ways to do this, but I will address that in a subsequent article as this is more about the interface it provides. Adding context can allow you to specify any number of dynamic situation that are taken into account when determining whether a user has access to perform an action. Consider the following:
This is a simple outline of how a Context Based Access Control API might look. Now in your code you might have something that looks like this:
Of course this is a very simple and open ended example, but it gets the point across rather well and illustrates the ability to solve problems in a manner that allows the AC layer to be as fine-grained or big hammer as it needs to be in any given situation.
To summarize:
1. Role Based Security doesn't address the problems of today's applications.
2. Context or Activity Based Security has the power to address those problems, but there is no force driving it forward.
I have proposed to the ESAPI team that we are in the perfect position to address this problem at the API level. Designing a clean interface that addresses the simple as well as complex control situations is really the difficult part in this concept.
An interesting read if you have a second - it looks like Microsoft had the same idea 2 years ago, too bad they patented it and did absolutely nothing with it.
http://www.freepatentsonline.com/y2007/0143823.html
I would love to hear what the 'verse has to say about this so let's get some conversation going around this topic and see what we can come up with!
I have contested lately that RBAC (Role Based Access Control) just flat out doesn't work in today's world. There was once upon a time, when applications were much simpler that this concept fit very well in the application security world, but we no longer live in that world.
The problem with RBAC is that it is a big hammer. It is what I like to refer to as an all-or-nothing solution to a problem that deserves a much finer grained answer. To elaborate on this, let me first explain the concept of RBAC.
1. Applications have users.
2. Users need to be be able to perform actions.
3. Actions are associated with Roles that are allowed to perform said actions.
4. Users belong to one or more Roles.
This is a very *simple* solution to the problem of access control in an application, and in simple applications, it works quite well. RBAC is widely supported by vendors and comes built in to most web application containers. It also happens to be the access control mechanism used by most OS vendors (Replace Role with Group and Voila!). There are tons of implementations for J2EE applications such as Spring ACEGI, JAAS, etc.
When you want to check if a user can perform some action you simply use a quick check
if ( user.isInRole( Roles.ADMINISTRATOR ) ) { doAdminStuff(); }
So what is the problem with this simple, widely supported, very popular means of access control? It's simple really, RBAC has no awareness of the context request for access.
To illustrate this point, consider the following situation.
You have just completed coding a wonderful project management application for Big Humungous Inc. that filled all the requirements, is unhackable, and even has a list of features far exceeding the clients requests. One day you recieve a call that your client is creating a special group of representatives that need to have administrative access to customer accounts that belong to Group A between 2pm and 3pm every Monday, Wednesday, and Friday.
Now you have a problem. A role is an all-or-nothing access mechanism. So you can't just add the users to the Administrator Role, so your only option is to go back into the code and add a new check in the code that says am I in this role, and do I meet all these other requisites to do the requested action. Then you need to implement that code in every place where you would normally ask is the user an administrator?
Now let me introduce to a different approach to Access Control. It goes by many names. Some call it Activity Based Access Control, I call it Context Based Access Control. The concept is simple.
Make your Access Control mechanism aware of context!
There are any number of ways to do this, but I will address that in a subsequent article as this is more about the interface it provides. Adding context can allow you to specify any number of dynamic situation that are taken into account when determining whether a user has access to perform an action. Consider the following:
public interface AccessContext { boolean isAllowed( User u ); } public interface AccessRole { // Some methods } public interface User { // ... Other user'ish stuff } public interface AccessController { boolean canUserPerformAction( User u, Context c, Action a ); } public interface Action { void performAction(); }
This is a simple outline of how a Context Based Access Control API might look. Now in your code you might have something that looks like this:
User user = RequestHelper.getUser(); Context requestContext = RequestHelper.getContext(); if ( accessController.canUserPerformAction( user, requestContext, new Action( "Delete User" ) );
Of course this is a very simple and open ended example, but it gets the point across rather well and illustrates the ability to solve problems in a manner that allows the AC layer to be as fine-grained or big hammer as it needs to be in any given situation.
To summarize:
1. Role Based Security doesn't address the problems of today's applications.
2. Context or Activity Based Security has the power to address those problems, but there is no force driving it forward.
I have proposed to the ESAPI team that we are in the perfect position to address this problem at the API level. Designing a clean interface that addresses the simple as well as complex control situations is really the difficult part in this concept.
An interesting read if you have a second - it looks like Microsoft had the same idea 2 years ago, too bad they patented it and did absolutely nothing with it.
http://www.freepatentsonline.com/y2007/0143823.html
I would love to hear what the 'verse has to say about this so let's get some conversation going around this topic and see what we can come up with!
Labels:
application security,
ESAPI
8.06.2009
Twitter DDoS'd - Not related to recent activities?
Let me start by saying, "Yeah Right, Twitter!"
Here's the problem; nobody, least of all, Twitter, really knows the extent of the information that was acquired by Hacker Croll a few weeks ago. There is only speculation as to how deep into Twitters infrastructure he got, and only he knows.
Now, just a couple of weeks after the Hacker Croll incident, Twitter suffers from a massive DDoS attack. There are 2 types of DDoS attacks, those that are meant to bring a network down completely, or those that are meant to divert the corporate I.T. guys attention for a period of time while the real work is done on the target service that isn't getting attacked.
If I were Twitter, that's where I would be focusing my attention at this very minute. What services didn't suffer from the DDoS - who accessed those services while the DDoS was happening. Any defiant who has any experience in the field at all will have erased their tracks long before anyone thought to focus on the stuff that didn't go down, and so it is likely, whatever the real purpose of the DDoS was, Twitter will have to sit on their hands until it is revealed or the person behind it slips up.
So, you might find yourself asking, "Well what should you do in a DDoS situation?"
Other's will have different opinions I am sure, but my answer is simple. Focus 50% of your resources on the services that are down and the rest on the services that aren't seemingly affected.
It is always possible this was just some group of $kiddies with a network of zombies just pulling a prank, but given the amount of news around Twitter lately, and the high-profile hacks that have infected their media coverage - I find that highly unlikely.
We will see if I am right soon enough I suppose, but at bare minimum, if I were at Twitter, I would be focusing a lot of attention around performing a full site audit right now and taking inventory of every machine that has access to the internal network, as well as auditing every employee in the organization who was involved directly or indirectly with the fiasco a couple of weeks ago.
What are your thoughts?
Here's the problem; nobody, least of all, Twitter, really knows the extent of the information that was acquired by Hacker Croll a few weeks ago. There is only speculation as to how deep into Twitters infrastructure he got, and only he knows.
Now, just a couple of weeks after the Hacker Croll incident, Twitter suffers from a massive DDoS attack. There are 2 types of DDoS attacks, those that are meant to bring a network down completely, or those that are meant to divert the corporate I.T. guys attention for a period of time while the real work is done on the target service that isn't getting attacked.
If I were Twitter, that's where I would be focusing my attention at this very minute. What services didn't suffer from the DDoS - who accessed those services while the DDoS was happening. Any defiant who has any experience in the field at all will have erased their tracks long before anyone thought to focus on the stuff that didn't go down, and so it is likely, whatever the real purpose of the DDoS was, Twitter will have to sit on their hands until it is revealed or the person behind it slips up.
So, you might find yourself asking, "Well what should you do in a DDoS situation?"
Other's will have different opinions I am sure, but my answer is simple. Focus 50% of your resources on the services that are down and the rest on the services that aren't seemingly affected.
It is always possible this was just some group of $kiddies with a network of zombies just pulling a prank, but given the amount of news around Twitter lately, and the high-profile hacks that have infected their media coverage - I find that highly unlikely.
We will see if I am right soon enough I suppose, but at bare minimum, if I were at Twitter, I would be focusing a lot of attention around performing a full site audit right now and taking inventory of every machine that has access to the internal network, as well as auditing every employee in the organization who was involved directly or indirectly with the fiasco a couple of weeks ago.
What are your thoughts?
Labels:
ddos,
hack,
internet security,
twitter
8.05.2009
The State of Internet Security - Revisited
In June I wrote a blog on the state of security on the net and I keep hearing the experts saying the same thing that I have said. In an interview with Dan Kaminsky about a recent SSL and DNS Vulnerability, Kaminsky put it out there the same way I did.
"This is our best technology for doing authentication and it failed," he said. "We'll fix it, but it's another sign that we need to revisit how we do the basics; how we do authentication on the internet."
That's exactly it, we need to go back to the drawing board. Why don't we spend some time and money and put all of these experts, and I mean the real experts, the ones who are breaking protocols and smashing the stack every day because they enjoy it, get them all in one place. Why don't we give them a digital whiteboard, all the food they can handle, and let them design a system that works!
Granted, there is no such thing as a completely secure system, but I'll bet that armed with the knowledge that we have today, the tools and a budget, we could come up with something that is a lot closer than a system that was designed before XSS and SQL Injection on the internet were even a twinkle in some $kiddie's parent's eye.
I feel a little bit better now after that rant. What really irks me is that everyone has thought it, most of us have even said it aloud! The system doesn't work. We keep trying to hack fixes into decades old code to account for these new bugs, but it's like putting a brand new Hemi into a 1982 Toyota Corolla - it just doesn't work.
"This is our best technology for doing authentication and it failed," he said. "We'll fix it, but it's another sign that we need to revisit how we do the basics; how we do authentication on the internet."
That's exactly it, we need to go back to the drawing board. Why don't we spend some time and money and put all of these experts, and I mean the real experts, the ones who are breaking protocols and smashing the stack every day because they enjoy it, get them all in one place. Why don't we give them a digital whiteboard, all the food they can handle, and let them design a system that works!
Granted, there is no such thing as a completely secure system, but I'll bet that armed with the knowledge that we have today, the tools and a budget, we could come up with something that is a lot closer than a system that was designed before XSS and SQL Injection on the internet were even a twinkle in some $kiddie's parent's eye.
I feel a little bit better now after that rant. What really irks me is that everyone has thought it, most of us have even said it aloud! The system doesn't work. We keep trying to hack fixes into decades old code to account for these new bugs, but it's like putting a brand new Hemi into a 1982 Toyota Corolla - it just doesn't work.
Labels:
dns,
internet security,
Kaminsky,
ssl
8.04.2009
Synchronizing the HttpSession
This is something that I have heard a great deal of debate over the last 2 years about. The servlet spec was somewhat recently amended to clarify that there is no guarantee that multiple calls to HttpServletRequest.getSession() or HttpServletRequest.getSession(boolean) will return the same object. This holds especially true in containers that return a facade object that wraps around the actual HttpSession object that you are working with, like Tomcat.
Why would you want to synchronize a session anyway?
The answer is pretty simple actually. Consider the following theoretical block of code:
Now there are a couple things that I will point out that I am sure you will notice if you are paying attention. The first is that yes, this example is using the ESAPI. Call it a shameless plug :). The second is that I am ignoring AccessControlExceptions. This is purely to keep this example scenario short and to the point, and in any production code, you would never want to do this. There would also be some validation code in there as well.
Aside from those things, it looks innocent enough right? Well let's consider this for a second with a scenario.
Joe needs to have a check cut to him from his account at SomeBodiesBank. So he gets online and hits the form for the above servlet. Joe is not that savvy of a computer user, and like most novice internet users will do, he has the tendency to double-click on everything. He fills out the form to withdraw $500 from his account and double-clicks the submit button. So somewhere on the backend, we'll say in the AccountFacade.withdraw method, the software validates that Joe has enough money to cover the check, it discovers he has $750 in his Checking account so everything looks good. But wait a minute, Joe double-clicked remember?
Do you know what happens when you double click the submit button on a form? Well, 2 requests get submitted one after the other. Hmmmmmm.. So now I have 2 requests entering this method at the exact same time, both requests check Joe's balance and discover that he has $750 in his account, so they both queue up a request to print a check for the requested amount. There's only one problem, these are cashiers checks, the bank has withdrawn $1000 dollars (or in some circumstances, maybe only withdrew the original $500 from his account) but Joe ended up with $1000 in cashiers checks!
The checks show up in the mail, and Joe being the responsible individual he is, reports this to the bank. The bank will likely write this off as an anomoly and the bug will remain until one day when Joe is down on his luck and remembers the bug. He finds a program called JMeter and submits 1000 requests to the servlet as fast as he can for $1000 withdrawals. When his 1,000,000 in cashiers checks arrive, he promptly leaves the country and disappears in the backwoods of New Zealand never to be heard from again.
So the moral of the story is that this problem could have been easily avoided simply by adding thread-safety measures to the code. Granted the example cited is extreme and the consequence of the example even more extreme, but I can promise you that something similar to the situation has already happened and even moreso I can guarantee that something similar will happen again.
So, with this knowledge, what is the correct means to add thread safety around manipulating the session. It's quite simple even.
Would do the trick in this simple example.
It's important when using synchronization to always lock on immutable objects. It is also important to use the same lock when locking in multiple places where you are working with the same data. Thread-safety is an entire subject on it's own that is will beyond the scope of this blog posting, so I will cut to the chase here.
This is incorrect, and not guaranteed:
While this method is proven and works:
Some interesting stats to close out with:
Google Code Search Results found approximately 4000 uses of different ways to say 'synchronized(session)'
The scary part is this was only the first 5 ways I came up with to search for it.
Why would you want to synchronize a session anyway?
The answer is pretty simple actually. Consider the following theoretical block of code:
public class WithdrawFundsServlet extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
User u = ESAPI.authenticator().getCurrentUser();
String withdrawAmt = request.getParameter("withdrawAmt");
float amt;
Account acct = session.getAttribute("acct_"+u.getAccount());
try
{
amt = Float.parseFloat(withdrawAmt);
}
catch ( Throwable t )
{
ESAPI.log().info( Logger.SECURITY_FAILURE, "Non-Numeric value passed as Withdraw Amount");
try {
ESAPI.httpUtilities().sendForward(request, response, "/error" );
} catch (AccessControlException ignored) { }
}
// Calling Withdraw will queue a check to be printed and mailed to the customer.
AccountFacade.withdraw( acct, amt );
try
{
ESAPI.httpUtilities().sendForward(request, response, "/success" );
}
catch (AccessControlException ignored) { }
return;
}
}
Now there are a couple things that I will point out that I am sure you will notice if you are paying attention. The first is that yes, this example is using the ESAPI. Call it a shameless plug :). The second is that I am ignoring AccessControlExceptions. This is purely to keep this example scenario short and to the point, and in any production code, you would never want to do this. There would also be some validation code in there as well.
Aside from those things, it looks innocent enough right? Well let's consider this for a second with a scenario.
Joe needs to have a check cut to him from his account at SomeBodiesBank. So he gets online and hits the form for the above servlet. Joe is not that savvy of a computer user, and like most novice internet users will do, he has the tendency to double-click on everything. He fills out the form to withdraw $500 from his account and double-clicks the submit button. So somewhere on the backend, we'll say in the AccountFacade.withdraw method, the software validates that Joe has enough money to cover the check, it discovers he has $750 in his Checking account so everything looks good. But wait a minute, Joe double-clicked remember?
Do you know what happens when you double click the submit button on a form? Well, 2 requests get submitted one after the other. Hmmmmmm.. So now I have 2 requests entering this method at the exact same time, both requests check Joe's balance and discover that he has $750 in his account, so they both queue up a request to print a check for the requested amount. There's only one problem, these are cashiers checks, the bank has withdrawn $1000 dollars (or in some circumstances, maybe only withdrew the original $500 from his account) but Joe ended up with $1000 in cashiers checks!
The checks show up in the mail, and Joe being the responsible individual he is, reports this to the bank. The bank will likely write this off as an anomoly and the bug will remain until one day when Joe is down on his luck and remembers the bug. He finds a program called JMeter and submits 1000 requests to the servlet as fast as he can for $1000 withdrawals. When his 1,000,000 in cashiers checks arrive, he promptly leaves the country and disappears in the backwoods of New Zealand never to be heard from again.
So the moral of the story is that this problem could have been easily avoided simply by adding thread-safety measures to the code. Granted the example cited is extreme and the consequence of the example even more extreme, but I can promise you that something similar to the situation has already happened and even moreso I can guarantee that something similar will happen again.
So, with this knowledge, what is the correct means to add thread safety around manipulating the session. It's quite simple even.
final Object lock = request.getSession().getId().intern();
synchronized(lock) {
AccountFacade.withdraw( acct, amt );
}
Would do the trick in this simple example.
It's important when using synchronization to always lock on immutable objects. It is also important to use the same lock when locking in multiple places where you are working with the same data. Thread-safety is an entire subject on it's own that is will beyond the scope of this blog posting, so I will cut to the chase here.
This is incorrect, and not guaranteed:
synchronized (request.getSession()) {
// do stuff
}
While this method is proven and works:
synchronized (request.getSession().getId().intern()) {
// do stuff
}
Some interesting stats to close out with:
Google Code Search Results found approximately 4000 uses of different ways to say 'synchronized(session)'
The scary part is this was only the first 5 ways I came up with to search for it.
Labels:
application security,
ESAPI,
J2EE,
java,
Servlets,
thread safety
8.03.2009
Eric Schmidt, Google, and Apple
It appears that my long lost relative, Eric Schmidt, has left the Apple BoD. What do I think about that, simply that I really need to get in touch with Eric and see if he wants to loan his long lost relative, me, a couple million. Other than that, I think this will probably resolve the whole Apple/Google thing for the most part, but I really don't think that Apple wants to get themselves into an Apple vs. Google scenario. We will see what happens, but I imagine this all going away and news shifting back into the Microsoft vs. the world scene shortly.
Labels:
Apple,
Eric Schmidt,
Google,
random thoughts
8.02.2009
What is ESAPI?
I have recently gotten involved in the OWASP ESAPI Project. I am on the team of developers working on v2.0 of the API which will include updating the API to take advantage of all the features that Java5 brought to the table, increasing performance of the reference implementation and improving thread-safety throughout the entire codebase. It has thus far been a great experience and there are some very smart people behind the entire project.
So what exactly is the OWASP ESAPI?
Well, let's start with, what exactly is OWASP?
OWASP is the Open Web Application Security Project. It is a NPO made up of people from all over the world with the single goal to provide a single repository of information and tools for writing secure and reliable web applications.
The ESAPI is a small part of the overall goal of OWASP, but is a great example of what OWASP stands for and has set out to do.
ESAPI stands for Enterprise Security API - and it is just that, an API. There is a reference implementation included in the distribution that can be dropped into an startup or existing application and configured to use, but the real power of the ESAPI is that it defines a standard interface for providing secure implementations of standard API methods that are not secure.
That is a pretty broad statement but it is probably the best way to explain it. See, the ESAPI is not an application by itself, it is not even really a framework - it is a toolkit. It provides you with an API that is self documenting and provides a central set of methods for developers to access information, log data, authenticate users, and much more.
The ESAPI is distributed for Java, .Net, and there are more implementations in the works for PHP, Python, and others I am sure.
So let's have a quick overview of what the ESAPI provides to developers:
1. Authentication - Provides a good reference implementation and a well documented authentication mechanism that can be used on top of the standard J2EE Security Model (Standard User/Role mechanism)
2. Logging - Provides a central repository for logging in your application. The Java API uses either the standard Java Logging or Log4J by default, but you could implement your own logging by implementing the Logger interface.
3. Validation - Provides a powerful set of input validation classes that not only validate but also filter user input to remove the responsibility of input filtering from the hands of your application developers.
4. Encoding/Decoding - A full toolset of Encoders and Decoders including UTF-8, Base64, and much more.
5. Web Application Fireall - WAF's are easily one of the most argued about issues in the Realm of AppSec, but there are several of them out there and the ESAPI makes it easy to implement your own WAF where it makes the most sense to me, at the Application Layer. The WAF works off the same principles of most where a set of rules and reactions are defined but by keeping it in the Application Layer, this will allow your Enterprise Security Architects, or even your regular old Developers to create complex WAF rules based on logic that can be determined by the state of your application itself. This is a very powerful tool for large web applications.
These are the 5 "main" parts of the ESAPI. Now let's get to the REAL power of the ESAPI.
In a normal web application, your security constraints and controls are defined across your entire codebase, where they are used. This creates a couple of problems. The larger your application becomes, the more difficult this becomes to maintain. Developers will start coding their own solutions to security concerns as opposed to using the one that is used everyplace else simply because they may not know that the problem they are trying to solve has already been solved. So now you have 2 different ways to solve the same problem. Sound like a maintenance nightmare waiting to happen?
The biggest feature in my mind of the ESAPI is that it allows your developers to focus on writing the code that they are good at. Not everyone is a security expert, and even if they aren't they are probably really good at their job, that is why you hired them. Your security (whether it be the guy that used to hack websites for fun, or a genuine Enterprise Security Architect) can define the rules and requirements of your applications security, implement it once and your developers will know that if I want to authenticate a user I just use:
Sounds pretty easy right? It is!
I strongly recommend that anyone starting a new application look into the ESAPI for their application. There is a ton of information in general on Web Application Security on the OWASP site.
ESAPI Links:
ESAPI Homepage
ESAPI on Google Code
ESAPI .Net
So what exactly is the OWASP ESAPI?
Well, let's start with, what exactly is OWASP?
OWASP is the Open Web Application Security Project. It is a NPO made up of people from all over the world with the single goal to provide a single repository of information and tools for writing secure and reliable web applications.
The ESAPI is a small part of the overall goal of OWASP, but is a great example of what OWASP stands for and has set out to do.
ESAPI stands for Enterprise Security API - and it is just that, an API. There is a reference implementation included in the distribution that can be dropped into an startup or existing application and configured to use, but the real power of the ESAPI is that it defines a standard interface for providing secure implementations of standard API methods that are not secure.
That is a pretty broad statement but it is probably the best way to explain it. See, the ESAPI is not an application by itself, it is not even really a framework - it is a toolkit. It provides you with an API that is self documenting and provides a central set of methods for developers to access information, log data, authenticate users, and much more.
The ESAPI is distributed for Java, .Net, and there are more implementations in the works for PHP, Python, and others I am sure.
So let's have a quick overview of what the ESAPI provides to developers:
1. Authentication - Provides a good reference implementation and a well documented authentication mechanism that can be used on top of the standard J2EE Security Model (Standard User/Role mechanism)
2. Logging - Provides a central repository for logging in your application. The Java API uses either the standard Java Logging or Log4J by default, but you could implement your own logging by implementing the Logger interface.
3. Validation - Provides a powerful set of input validation classes that not only validate but also filter user input to remove the responsibility of input filtering from the hands of your application developers.
4. Encoding/Decoding - A full toolset of Encoders and Decoders including UTF-8, Base64, and much more.
5. Web Application Fireall - WAF's are easily one of the most argued about issues in the Realm of AppSec, but there are several of them out there and the ESAPI makes it easy to implement your own WAF where it makes the most sense to me, at the Application Layer. The WAF works off the same principles of most where a set of rules and reactions are defined but by keeping it in the Application Layer, this will allow your Enterprise Security Architects, or even your regular old Developers to create complex WAF rules based on logic that can be determined by the state of your application itself. This is a very powerful tool for large web applications.
These are the 5 "main" parts of the ESAPI. Now let's get to the REAL power of the ESAPI.
In a normal web application, your security constraints and controls are defined across your entire codebase, where they are used. This creates a couple of problems. The larger your application becomes, the more difficult this becomes to maintain. Developers will start coding their own solutions to security concerns as opposed to using the one that is used everyplace else simply because they may not know that the problem they are trying to solve has already been solved. So now you have 2 different ways to solve the same problem. Sound like a maintenance nightmare waiting to happen?
The biggest feature in my mind of the ESAPI is that it allows your developers to focus on writing the code that they are good at. Not everyone is a security expert, and even if they aren't they are probably really good at their job, that is why you hired them. Your security (whether it be the guy that used to hack websites for fun, or a genuine Enterprise Security Architect) can define the rules and requirements of your applications security, implement it once and your developers will know that if I want to authenticate a user I just use:
ESAPI.authenticator().login(HTTPServletRequest, HTTPServletResponse);
Sounds pretty easy right? It is!
I strongly recommend that anyone starting a new application look into the ESAPI for their application. There is a ton of information in general on Web Application Security on the OWASP site.
ESAPI Links:
ESAPI Homepage
ESAPI on Google Code
ESAPI .Net
8.01.2009
Lucene - Lessons Learned
Over the last 3 years I have kind of taken the role at work as the Lucene expert. Any enhancements that require search components either come to me directly, or the developer on the project is told to run their ideas by me or chat with me about the project as a bare minimum. This has proven to be a very valuable role in my career and as such has given me the opportunity to lead my team on other experimental projects and concepts with things like JMX, JMS, Security, etc.
As great of a product as Apache Lucene is, it simply amazes me how a product that has been around for so long, and that is used by so many people around the world, has so little documentation. Googling lucene issues will often answer your questions, but rarely do I find the answer to a question I have on any sites that are directly associated with Lucene.
That being said, most of what I know about Lucene has been learned by trial and error, and looking at the source. Last week I was tasked with increasing the relevancy of our search results on some of the search components that I had developed. I was going to be experimenting with boosting score for matches in particular fields and also tuning the fuzzy search to provide as accurate results as could be obtained without completely rewriting the data that backed the search.
Enter the Lucene QueryParser - a mysterious and from what I can gather, not very well understood but extremely powerful tool in the Lucene framework. The QueryParser takes a 'google-style' text query and turns it into a Query object that can be used to search your index. For example, the following string:
name:Chris and name:Schmidt
Would be turned into a Query containing 2 BooleanQuery objects. There are some modifiers that can be added to queries to alter the way that Lucene builds the query objects and this adds a great amout of flexibility to simple searches.
The first one that I will be talking about is the Boost modifier (^). This allows you to specify that for matches found in a particular field, the relevancy should be boosted by a factor of X (or some derivitave therein, since rarely have I seen the boost that I specify as the actual boost applied to the score). To expand on the above example:
name:Chris^1.5 name:Schmidt
This interprets to the same query as above, with the exception that if 10 Schmidts are found, Chris would be the most relevant Schmidt that I am searching for, so if there is a Chris Schmidy result, he should be moved up in relevancy by a factor of 1.5. This can be a pretty handy tool, but it is extremely easy to overdo it on the boosting which can completely destroy the relevancy of your results. A good rule of thumb is to start by boosting the field that you think will apply as the most relevant for the context of the search being performed and boost it in small increments only. a boost of 1.5 may not seem like much until you see how it actually affects your results.
Another good rule of thumb with boosts, is to apply them to things that will be exact keyword matches, applying boost to a fuzzy search will greatly reduce the relevancy of the results you are returning.
Now let's move on to the next modifier, the fuzzy search (~) modifier. This is another one that if used incorrectly can greatly reduce the relevancy of the results that a query returns, and a side effect of using fuzzy searches is that it will return exponentially more results that a standard keyword search will. The fuzzy search uses the Levenshtein Edit Distance Algorythm to calculate what a user actually meant to search for when they fat=finger or mispell a search term.
If you are unfamiliar with the Levenshtein Edit Distance concept, it is a basically a mathematical formula to calculate the # of edits that would need to be applied to a word to transform it into another word. This is a very popular algorythm used by spell checkers and similar applications. An example would be:
C H R I S
C O O L
The edit distance between the 2 words presented above is 4.
To transform Chris into Cool the following edits would have to be made:
1. Change H -> O
2. Change R -> O
3. Change I -> L
4. Drop the S
Lucene usees this Algorythm to calculate word similarity. Although the implementation of this algorythm in Lucene is far more complex ( FuzzyTermEnum - Line 168 ) the basic's are that Lucene calculates the Edit distance between the two words, and divides the edit distance by the length of the shorter term which provides the similarity between the two words.
By default, the Fuzzy search will default to a value of 0.5 similarity as it's threshold but this has always seemed pretty aggressive for most fuzzy searches to me as it basically insinuates that half of the letters in the term can be different and it will consider this a match.
I generally have gone to starting with a baseline of 0.85 and incrementing or decrementing by 0.05 until I reach the sweet-spot where I am finding common mispellings of the terms that I am tuning for but not overdoing it. A good example of where overdoing a fuzzy search can be detrimental is at ServiceMagic where I work we index the names of Home Improvement tasks. There are 2 examples I can think of off the top of my head that have bit us with fuzzy searching.
SIDING
SHINGLE
SLIDE
PLUMBER
PLAYER (DVD)
As you can tell, the tasks that were matched with fuzzy searches have no contextual attachment to eachother. Someone who is looking to get new Siding on there house is probably not looking for someone to repair the roof on their house, or build a playground in the backyard. Along the same lines, someone who has a clogged drain is more than likely not looking for someone to help them install and configure their Blu-Ray DVD Player.
Both of these Modifiers are extremely powerful ways to make your search results great, but they both have drawbacks and when used incorrectly can ruin your results. There is another gotcha with Fuzzy searches that I want to cover quickly. I will probably go into more depth on this subject in a subsequent blog post, however, this bit me hard last week and I think it is worthwhile to share.
There are a great deal of other things that can be done both during indexing and searching to make your results better and one of those is using the PorterStemFilter which takes words that are passed into it and transforms them into what it believes to be the 'root' word of that word. An example of this would be if you had the term Writing or Writes the root word that would be returned after filtering would be Write. This is something that happens in the Analyzer stage of both indexing and queryparsing and as such the following is important to remember when using both fuzzy searches and stemming. If you pass in a query like writing~0.085 to the QueryParser, you would probably assume that the parsed query might look like write~0.085 however, the PorterStemFilter will not stem words that are modified with the Fuzzy search modifier. Where this becomes important is where you are stemming during the indexing phase and doing fuzzy searches in the searching phase. The keyword writing will not actually be indexed anywhere and the by-product of this is that you may not match any documents that you would expect to be matching with this query.
If you are using Stemming and Fuzzy queries, the answer I have found is to generate queries that look similar to this:
keywords:writing~0.085 keywords:writing^1.10 (Match to terms in the keyword field with .085 similarity or better without stemming but apply a 1.10 boost to matches of the stemmed keyword)
It may seem redundant, but when using Stemming the final parsed query will actually be:
keywords:writing~0.085 keywords:write^1.10
This will drastically improve the quality of results when using both of these things to tune your results and indexes.
I would be happy to answer any questions that anyone has about this so feel free to comment or ask away.
As great of a product as Apache Lucene is, it simply amazes me how a product that has been around for so long, and that is used by so many people around the world, has so little documentation. Googling lucene issues will often answer your questions, but rarely do I find the answer to a question I have on any sites that are directly associated with Lucene.
That being said, most of what I know about Lucene has been learned by trial and error, and looking at the source. Last week I was tasked with increasing the relevancy of our search results on some of the search components that I had developed. I was going to be experimenting with boosting score for matches in particular fields and also tuning the fuzzy search to provide as accurate results as could be obtained without completely rewriting the data that backed the search.
Enter the Lucene QueryParser - a mysterious and from what I can gather, not very well understood but extremely powerful tool in the Lucene framework. The QueryParser takes a 'google-style' text query and turns it into a Query object that can be used to search your index. For example, the following string:
name:Chris and name:Schmidt
Would be turned into a Query containing 2 BooleanQuery objects. There are some modifiers that can be added to queries to alter the way that Lucene builds the query objects and this adds a great amout of flexibility to simple searches.
The first one that I will be talking about is the Boost modifier (^). This allows you to specify that for matches found in a particular field, the relevancy should be boosted by a factor of X (or some derivitave therein, since rarely have I seen the boost that I specify as the actual boost applied to the score). To expand on the above example:
name:Chris^1.5 name:Schmidt
This interprets to the same query as above, with the exception that if 10 Schmidts are found, Chris would be the most relevant Schmidt that I am searching for, so if there is a Chris Schmidy result, he should be moved up in relevancy by a factor of 1.5. This can be a pretty handy tool, but it is extremely easy to overdo it on the boosting which can completely destroy the relevancy of your results. A good rule of thumb is to start by boosting the field that you think will apply as the most relevant for the context of the search being performed and boost it in small increments only. a boost of 1.5 may not seem like much until you see how it actually affects your results.
Another good rule of thumb with boosts, is to apply them to things that will be exact keyword matches, applying boost to a fuzzy search will greatly reduce the relevancy of the results you are returning.
Now let's move on to the next modifier, the fuzzy search (~) modifier. This is another one that if used incorrectly can greatly reduce the relevancy of the results that a query returns, and a side effect of using fuzzy searches is that it will return exponentially more results that a standard keyword search will. The fuzzy search uses the Levenshtein Edit Distance Algorythm to calculate what a user actually meant to search for when they fat=finger or mispell a search term.
If you are unfamiliar with the Levenshtein Edit Distance concept, it is a basically a mathematical formula to calculate the # of edits that would need to be applied to a word to transform it into another word. This is a very popular algorythm used by spell checkers and similar applications. An example would be:
C H R I S
C O O L
The edit distance between the 2 words presented above is 4.
To transform Chris into Cool the following edits would have to be made:
1. Change H -> O
2. Change R -> O
3. Change I -> L
4. Drop the S
Lucene usees this Algorythm to calculate word similarity. Although the implementation of this algorythm in Lucene is far more complex ( FuzzyTermEnum - Line 168 ) the basic's are that Lucene calculates the Edit distance between the two words, and divides the edit distance by the length of the shorter term which provides the similarity between the two words.
By default, the Fuzzy search will default to a value of 0.5 similarity as it's threshold but this has always seemed pretty aggressive for most fuzzy searches to me as it basically insinuates that half of the letters in the term can be different and it will consider this a match.
I generally have gone to starting with a baseline of 0.85 and incrementing or decrementing by 0.05 until I reach the sweet-spot where I am finding common mispellings of the terms that I am tuning for but not overdoing it. A good example of where overdoing a fuzzy search can be detrimental is at ServiceMagic where I work we index the names of Home Improvement tasks. There are 2 examples I can think of off the top of my head that have bit us with fuzzy searching.
SIDING
SHINGLE
SLIDE
PLUMBER
PLAYER (DVD)
As you can tell, the tasks that were matched with fuzzy searches have no contextual attachment to eachother. Someone who is looking to get new Siding on there house is probably not looking for someone to repair the roof on their house, or build a playground in the backyard. Along the same lines, someone who has a clogged drain is more than likely not looking for someone to help them install and configure their Blu-Ray DVD Player.
Both of these Modifiers are extremely powerful ways to make your search results great, but they both have drawbacks and when used incorrectly can ruin your results. There is another gotcha with Fuzzy searches that I want to cover quickly. I will probably go into more depth on this subject in a subsequent blog post, however, this bit me hard last week and I think it is worthwhile to share.
There are a great deal of other things that can be done both during indexing and searching to make your results better and one of those is using the PorterStemFilter which takes words that are passed into it and transforms them into what it believes to be the 'root' word of that word. An example of this would be if you had the term Writing or Writes the root word that would be returned after filtering would be Write. This is something that happens in the Analyzer stage of both indexing and queryparsing and as such the following is important to remember when using both fuzzy searches and stemming. If you pass in a query like writing~0.085 to the QueryParser, you would probably assume that the parsed query might look like write~0.085 however, the PorterStemFilter will not stem words that are modified with the Fuzzy search modifier. Where this becomes important is where you are stemming during the indexing phase and doing fuzzy searches in the searching phase. The keyword writing will not actually be indexed anywhere and the by-product of this is that you may not match any documents that you would expect to be matching with this query.
If you are using Stemming and Fuzzy queries, the answer I have found is to generate queries that look similar to this:
keywords:writing~0.085 keywords:writing^1.10 (Match to terms in the keyword field with .085 similarity or better without stemming but apply a 1.10 boost to matches of the stemmed keyword)
It may seem redundant, but when using Stemming the final parsed query will actually be:
keywords:writing~0.085 keywords:write^1.10
This will drastically improve the quality of results when using both of these things to tune your results and indexes.
I would be happy to answer any questions that anyone has about this so feel free to comment or ask away.
Labels:
java,
Lucene,
Tips and Tricks
6.26.2009
State of the InterWebz - Authentication and Usability
I am going to divert away from my current JMX saga for a brief moment to discuss an interesting post on the internet that was shared with me regarding usability versus security with masked passwords.
The post, from Jakob Nielsen, states that it is time to do away with password masking to improve usability (post is referenced at the bottom of this post).
This got some interesting conversation going on the webappsec mailing list and got me to thinking about the problem. Jakob has a point in his original post, that as a result of password masking, the average user tends to choose simple passwords to remember (which are quick and easy dictionary cracks) especially if they plan on logging on to the website from their mobile device.
I am sure everyone is aware why password's are masked in just about every type of password type authentication medium across the globe (ie: website logins, OS login, ATM PIN, Alarm Code, etc.) but the main goal is to prevent "shoulder surfers" from easily learning peoples passwords. Arguably, this is a first line of defense only, and not even close to a solution as someone who is good at their craft will be able to get your password just as easily by watching you type it on the keyboard. There are also several other ways that a devious individual may gain access to your password with keyloggers, camera's, or good old fashioned social engineering and simple masking your password bears no protection against these types of recon.
Now don't get me wrong, I am of the mind that having your password masked is a good thing, if we continue to rely on password type authentication. Any security concept should be multi-layered to address as many concerns as possible.
On the iPhone, when you type in a password, it apparently shows the current character you are typing in the form field, then masks it when you move on to the next character. This is an interesting concept and one that has apparantly been adapted on some sites across the internet using a JavaScript hack on a normal form input field.
I am definitely not of the opinion that this is a good idea by any means. Password masking is something that happens at the browser level. A developer simply specifies that a field on a form is of type password, and the browser implements a masking strategy over that field. JavaScript has no place in the security realm, regardless of how harmless it may seem, most of the time when you introduce javascript as a security tool, you open more holes than you close due to the nature of the language and how it interacts with browsers.
This brings me to the point of this post.
We live in a world where every day we grow more and more reliant on the internet to go about our daily lives. We bank online, we shop online, we interact and communicate online, there isn't much that we cannot and in most cases *don't* do online.
The world was a much more seemingly innocent place when the internet was born and no one envisioned that it would be what it is today, and as such, the security model that the internet uses is far too lax to be reliable, no matter how many layers of hacks and crap we put in front of it. There will always be one significant security hole on every web application in the world, in every social network, and on every bank or shopping site.
Humans as a Vulnerability
No matter what we do, the human element will always be there and the average human being has no idea what dangers are lurking around every click he makes on the net. There are a ton of plugins and apps that help to mitigate the risks of using the internet like popup blockers, phishing filters, virus scanners, and so on. These are essential to have installed in any environment, but they do not solve the problem of a poor security model overall, they only patch little leaks in a sinking boat.
Browser manufacturers need to go back to the drawing board when it comes to security and especially when it comes to authentication.
I don't pretend to have all the answers, if I did know the answer, I would be a rich man. I do have thoughts and theories though.
Certificates
Certificates eliminate the need for any human interaction in the authentication space. This, generally, is a good thing provided the human involved can keep his certificates safe. The main problem with using certificates is that support for them in the web space at this point, is shaky at best and there is no good implementation model to use them for authentication to web applications that I have seen. I feel that using certificates is probably the best option that we have in today's world as a first line of defense in our applications when it comes to authenticating users and verifying their identity.
Certificates themselves can also have a password to use, or an empty password for passwordless use. So if you are using a certificate to authenticate and you don't have a password set on the certificate itself, someone only needs gain access to your certificate files to become you.
Password Vaults
Password vaults are a great idea for people who need to remember a lot of passwords. I personally have solved the complex password typing issue completely on my blackberry by using it's integrated password vault to manage all my different internet passwords on different sites. The idea behind a password vault is that you only need remember one password, the password to your vault. The vault stores all of your individual passwords (encrypted of course) and sends them to the application when it prompts you for your password. This is a pretty ingenious idea that has been around for quite some time, however it too has it's vulnerabilities.
Your passwords are stored in a file, usually in some type of proprietary encrypted format specific to the password vault application that you are using. And they are of course, protected by a password. If a nefarious individual determined what password vault you were using, and in turn acquired, through some questionable means, your data file and your password (see shoulder surfing, key logging, etc) they could potentially unlock every password to every application you use and thus, become you.
Fingerprinting
Fingerprinting is an interesting concept that covers a wide variety of authentication in the real world. From retinal scanners, to fingerprint scanners, to hardware addresses and device fingerprinting, this is an interesting yet still vulnerable means of authentication.
For example, a good friend of mine bought an HP laptop with a fingerprint scanner and a touch screen. Hmmm. Sounds too easy to me. I quickly was able to demonstrate how in less than a minute I could lift a fingerprint from his monitor, and use it to gain access to his laptop.
We've all seen in the movies when the guy wears contacts fashioned to look like someone else eye bypasses retinal scanning, and while that may be science fiction in most regards, with todays technology it is very possible to create if you have know the right people and they have the right equipment.
Hardware addresses and device fingerprinting are all-to-easy to bypass these days. Wireless access points have shown us how quickly and easy it is to become another device altogether through some quick air sniffing.
So whats the answer?
My theory is that all of the above should be combined, to create a multi-layered authentication engine that is capable of thwarting most would be attackers, or at least making their job a good deal more difficult and time consuming.
What if there was a browser that implemented a secure password vault (I am not talking about the 'Remember This Password' popup that browsers currently have, I am talking about a true password vault with strong encryption). Having a single password to unlock every password would encourage people to put more thought into creating a more complex password that would be much more difficult to crack. When you only need to remember a single password, it is much easier to remember a complex one. Since all interaction with the applications would be handled via the password vault, the actual application passwords could be extremely complex sequences generated randomly by the password vault itself. This would require application developers to change their line of thinking too, and implement a new type of signup functionality and authentication strategy. Perhaps password creation and authentication would happen as a seperate request over a https connection that would create a token on either side of the connection for challenge/response authentication per-request. But I digress..
Now let's take it to the next step. This is where certificates come in. Instead of unlocking the password vault with a password, what if the password vault was unlocked with a certificate. For the end user, if using a password secured certificate this really is a transparent feature to them, as they still enter a single password to unlock every password, however, it adds an additional layer of complexity to the nefarious hackers evil plot to become you. Now in addition to getting your password vault data files, they also have to acquire your private key to unlock the vault.
Already, we have not only improved the usability of authentication on the web, we have easily doubled or tripled the complexity that a hacker must overcome to steal your identity on the web, just by leveraging existing technologies in a way that makes sense for the security model. This is probably good enough, but I am going to take it even one step further, given the nature of what people do on the internet every day, I would like to see something even more secure.
This is where device fingerprinting comes into play. A USB key could be the answer to the problem of human interaction. A lot of people will immediately argue at this point that now authentication on a website requires a physical piece of hardware to be on your person at the point in time which you wish to use the service. This is where I say, yes, precisely.
Think about it this way, most of us live in a house or apartment; and we safeguard our belongings, information, and family by putting a door that locks and requires a key to unlock between ourselves and any would-be intruders. Granted, there are other ways into a house, just as their are other means to steal an identity on the internet, but if a thief had the choice of walking through an unlocked door or breaking a window, which do you think he would choose?
My concept is that browser manufacturers include a means for an end-user to generate a signed certificate, that is a certificate signed by the browser manufacturer itself, that's single purpose is to live on removable storage (ie: USB thumb drives) and unlock their password vault. The end user then slides the USB thumb drive onto their key ring right between their car keys and the key to their deadbolt on their house and voila! Unless you have a habit of leaving the house without your car keys you will most likely have your authentication with you at all times.
Now, I will be the first to admit, this all seems like it would be far too much for say signing on to facebook, however, think about signing on to your bank website, or purchasing items online, or trading stock, updating insurance information, or filing taxes. All these things are sensitive operations that I believe require a great deal more authentication and proof of identity on the web than typing bob303 into a password box on a web form.
Security constraints should always reflect the level of sensitivity of the action that you are trying to take and it should be no more different on the internet. The responsibility lies on the browser manufacturers, the server developers, and the application developers to recognize this and implement something new. I am sure that there are holes in my concept and there are definately ways around it if an application has vulnerabilities outside of it's authentication scheme (ie: session hijacking, cross-site request forgery, etc.) but your authentication is your first line of defense and as such it should be treated with a great deal more respect than it currently is.
I could probably continue on about this for another 50 pages, and perhaps I will have to write a paper detailing my ideas and theories about the general state of security on the web and the mindset of the people that create functionality therein, but for now I think this brings up some interesting topics for debate and conversation and I am anxious to hear other peoples thoughts.
References:
Jakob Nielsen's post on Password Masking versus Usability
http://www.useit.com/alertbox/passwords.html
WebAppSec Consortium (Home of the Web Security mailing list)
http://www.webappsec.org
The post, from Jakob Nielsen, states that it is time to do away with password masking to improve usability (post is referenced at the bottom of this post).
This got some interesting conversation going on the webappsec mailing list and got me to thinking about the problem. Jakob has a point in his original post, that as a result of password masking, the average user tends to choose simple passwords to remember (which are quick and easy dictionary cracks) especially if they plan on logging on to the website from their mobile device.
I am sure everyone is aware why password's are masked in just about every type of password type authentication medium across the globe (ie: website logins, OS login, ATM PIN, Alarm Code, etc.) but the main goal is to prevent "shoulder surfers" from easily learning peoples passwords. Arguably, this is a first line of defense only, and not even close to a solution as someone who is good at their craft will be able to get your password just as easily by watching you type it on the keyboard. There are also several other ways that a devious individual may gain access to your password with keyloggers, camera's, or good old fashioned social engineering and simple masking your password bears no protection against these types of recon.
Now don't get me wrong, I am of the mind that having your password masked is a good thing, if we continue to rely on password type authentication. Any security concept should be multi-layered to address as many concerns as possible.
On the iPhone, when you type in a password, it apparently shows the current character you are typing in the form field, then masks it when you move on to the next character. This is an interesting concept and one that has apparantly been adapted on some sites across the internet using a JavaScript hack on a normal form input field.
I am definitely not of the opinion that this is a good idea by any means. Password masking is something that happens at the browser level. A developer simply specifies that a field on a form is of type password, and the browser implements a masking strategy over that field. JavaScript has no place in the security realm, regardless of how harmless it may seem, most of the time when you introduce javascript as a security tool, you open more holes than you close due to the nature of the language and how it interacts with browsers.
This brings me to the point of this post.
We live in a world where every day we grow more and more reliant on the internet to go about our daily lives. We bank online, we shop online, we interact and communicate online, there isn't much that we cannot and in most cases *don't* do online.
The world was a much more seemingly innocent place when the internet was born and no one envisioned that it would be what it is today, and as such, the security model that the internet uses is far too lax to be reliable, no matter how many layers of hacks and crap we put in front of it. There will always be one significant security hole on every web application in the world, in every social network, and on every bank or shopping site.
Humans as a Vulnerability
No matter what we do, the human element will always be there and the average human being has no idea what dangers are lurking around every click he makes on the net. There are a ton of plugins and apps that help to mitigate the risks of using the internet like popup blockers, phishing filters, virus scanners, and so on. These are essential to have installed in any environment, but they do not solve the problem of a poor security model overall, they only patch little leaks in a sinking boat.
Browser manufacturers need to go back to the drawing board when it comes to security and especially when it comes to authentication.
I don't pretend to have all the answers, if I did know the answer, I would be a rich man. I do have thoughts and theories though.
Certificates
Certificates eliminate the need for any human interaction in the authentication space. This, generally, is a good thing provided the human involved can keep his certificates safe. The main problem with using certificates is that support for them in the web space at this point, is shaky at best and there is no good implementation model to use them for authentication to web applications that I have seen. I feel that using certificates is probably the best option that we have in today's world as a first line of defense in our applications when it comes to authenticating users and verifying their identity.
Certificates themselves can also have a password to use, or an empty password for passwordless use. So if you are using a certificate to authenticate and you don't have a password set on the certificate itself, someone only needs gain access to your certificate files to become you.
Password Vaults
Password vaults are a great idea for people who need to remember a lot of passwords. I personally have solved the complex password typing issue completely on my blackberry by using it's integrated password vault to manage all my different internet passwords on different sites. The idea behind a password vault is that you only need remember one password, the password to your vault. The vault stores all of your individual passwords (encrypted of course) and sends them to the application when it prompts you for your password. This is a pretty ingenious idea that has been around for quite some time, however it too has it's vulnerabilities.
Your passwords are stored in a file, usually in some type of proprietary encrypted format specific to the password vault application that you are using. And they are of course, protected by a password. If a nefarious individual determined what password vault you were using, and in turn acquired, through some questionable means, your data file and your password (see shoulder surfing, key logging, etc) they could potentially unlock every password to every application you use and thus, become you.
Fingerprinting
Fingerprinting is an interesting concept that covers a wide variety of authentication in the real world. From retinal scanners, to fingerprint scanners, to hardware addresses and device fingerprinting, this is an interesting yet still vulnerable means of authentication.
For example, a good friend of mine bought an HP laptop with a fingerprint scanner and a touch screen. Hmmm. Sounds too easy to me. I quickly was able to demonstrate how in less than a minute I could lift a fingerprint from his monitor, and use it to gain access to his laptop.
We've all seen in the movies when the guy wears contacts fashioned to look like someone else eye bypasses retinal scanning, and while that may be science fiction in most regards, with todays technology it is very possible to create if you have know the right people and they have the right equipment.
Hardware addresses and device fingerprinting are all-to-easy to bypass these days. Wireless access points have shown us how quickly and easy it is to become another device altogether through some quick air sniffing.
So whats the answer?
My theory is that all of the above should be combined, to create a multi-layered authentication engine that is capable of thwarting most would be attackers, or at least making their job a good deal more difficult and time consuming.
What if there was a browser that implemented a secure password vault (I am not talking about the 'Remember This Password' popup that browsers currently have, I am talking about a true password vault with strong encryption). Having a single password to unlock every password would encourage people to put more thought into creating a more complex password that would be much more difficult to crack. When you only need to remember a single password, it is much easier to remember a complex one. Since all interaction with the applications would be handled via the password vault, the actual application passwords could be extremely complex sequences generated randomly by the password vault itself. This would require application developers to change their line of thinking too, and implement a new type of signup functionality and authentication strategy. Perhaps password creation and authentication would happen as a seperate request over a https connection that would create a token on either side of the connection for challenge/response authentication per-request. But I digress..
Now let's take it to the next step. This is where certificates come in. Instead of unlocking the password vault with a password, what if the password vault was unlocked with a certificate. For the end user, if using a password secured certificate this really is a transparent feature to them, as they still enter a single password to unlock every password, however, it adds an additional layer of complexity to the nefarious hackers evil plot to become you. Now in addition to getting your password vault data files, they also have to acquire your private key to unlock the vault.
Already, we have not only improved the usability of authentication on the web, we have easily doubled or tripled the complexity that a hacker must overcome to steal your identity on the web, just by leveraging existing technologies in a way that makes sense for the security model. This is probably good enough, but I am going to take it even one step further, given the nature of what people do on the internet every day, I would like to see something even more secure.
This is where device fingerprinting comes into play. A USB key could be the answer to the problem of human interaction. A lot of people will immediately argue at this point that now authentication on a website requires a physical piece of hardware to be on your person at the point in time which you wish to use the service. This is where I say, yes, precisely.
Think about it this way, most of us live in a house or apartment; and we safeguard our belongings, information, and family by putting a door that locks and requires a key to unlock between ourselves and any would-be intruders. Granted, there are other ways into a house, just as their are other means to steal an identity on the internet, but if a thief had the choice of walking through an unlocked door or breaking a window, which do you think he would choose?
My concept is that browser manufacturers include a means for an end-user to generate a signed certificate, that is a certificate signed by the browser manufacturer itself, that's single purpose is to live on removable storage (ie: USB thumb drives) and unlock their password vault. The end user then slides the USB thumb drive onto their key ring right between their car keys and the key to their deadbolt on their house and voila! Unless you have a habit of leaving the house without your car keys you will most likely have your authentication with you at all times.
Now, I will be the first to admit, this all seems like it would be far too much for say signing on to facebook, however, think about signing on to your bank website, or purchasing items online, or trading stock, updating insurance information, or filing taxes. All these things are sensitive operations that I believe require a great deal more authentication and proof of identity on the web than typing bob303 into a password box on a web form.
Security constraints should always reflect the level of sensitivity of the action that you are trying to take and it should be no more different on the internet. The responsibility lies on the browser manufacturers, the server developers, and the application developers to recognize this and implement something new. I am sure that there are holes in my concept and there are definately ways around it if an application has vulnerabilities outside of it's authentication scheme (ie: session hijacking, cross-site request forgery, etc.) but your authentication is your first line of defense and as such it should be treated with a great deal more respect than it currently is.
I could probably continue on about this for another 50 pages, and perhaps I will have to write a paper detailing my ideas and theories about the general state of security on the web and the mindset of the people that create functionality therein, but for now I think this brings up some interesting topics for debate and conversation and I am anxious to hear other peoples thoughts.
References:
Jakob Nielsen's post on Password Masking versus Usability
http://www.useit.com/alertbox/passwords.html
WebAppSec Consortium (Home of the Web Security mailing list)
http://www.webappsec.org
6.17.2009
JMX Management for fun and profit - Act 2
So now that we have caused an ample amount of trouble with our App Servers messing around with all those internal MBeans, stopping servers, etc.; I think it is time to move on to Act 2, in which our hero creates a custom MBean!
This is where things start to get interesting. Let's say for arguments sake, that you have a cache that you would like to be able to invalidate from anywhere. This will be the premise for our first custom MBean.
Our first order of business is to create our "Cache" object that we want to manage. Normally this would exist already, however, for the sake of education, I will create one here first.
UserCache.java
The next step is to create our MBean Interface. JMX Spec says that an MBean interface and implementation should follow a strict naming convention where the MBean interface is the managed objects name followed by MBean. So for our example, we are going to create a Manager object called UserCacheManager, thus, our interface shall be named (in the same package) UserCacheManagerMBean.
Let's take a look at our MBean interface.
UserCacheManagerMBean.java
Pretty simple and straight forward if I had to say so myself. Next we will create our management object, the UserCacheManager.
So far, so good. The only thing we have left to do is register it with our MBeanServer. If you are running an application server, you can just about guarantee that one has already been created and has a ton of beans registered in it already. Bearing that in mind, let's count on that for the purpose of this example and register our mbean with the default server.
This next class will act as the Factory object for getting a UserCache, and will be the point of entry from our entire program, so we will let it handle the registering of the management object.
UserCacheFactory.java
Let's look at what we are doing in the above code.
First off, we made it a little more threadsafe with a singleton factory that has a synchronized getUserCache() method.
Second, if we are constructing a new UserCache when we access it from the factory, we also create the manager bean for the class at the same time and register it with the MBeanServer.
Let's take a second to talk about this mysterious new ObjectName object that I am creating for this MBean.
ObjectName is an object that represents the namespace that a bean will live in once it is registered with the MBeanServer. This is how any clients or agents will know of this MBean. The namespacing is made up of two main parts, delimited by a colon. The first part is the Domain in which the MBean lives - This should be somewhat specific to your application. The second part of the ObjectName string is a set of key=val pairs delimited by commas. For example, key1=val1,key2=val2.
If you were to browse in jConsole for our new UserCacheManager, you would find it listed under:
If you were to lookup the MBean by it's name you would look it up using the same string that we passed into the ObjectName constructor.
Voila! We have created our first custom MBean and have registered it with our default MBeanServer instance. If you run this code in your App Server, you should be able to connect to the server's JMX port with jConsole and see the manager object appear in the MBeans tab shortly after you access the method UserCacheFactory.getInstance().getUserCache() for the first time in your application.
Stay tuned in Act 3 as we explore the different types of MBeans that you can create aside from the above demonstrated standard MBean. We will look at MXBeans, Open MBeans, Dynamic MBeans, and Model MBeans; see examples of each; and determine the positives and negatives of using each type.
This is where things start to get interesting. Let's say for arguments sake, that you have a cache that you would like to be able to invalidate from anywhere. This will be the premise for our first custom MBean.
Our first order of business is to create our "Cache" object that we want to manage. Normally this would exist already, however, for the sake of education, I will create one here first.
UserCache.java
package mbeans;
/**
* Before you all go crazy about this should be a singleton and it is not threadsafe,
* let me just point out that this is for all intents and purposes a "mock-object"
* designed to prove a point. That is all.
*/
public class UserCache {
// For simplicity, I will make this package-private so the management wrapper
// can access it easily.
MapuserCache;
public UserCache() {
this.userCache = new HashMap();
}
public User getUser(String username) {
User u = userCache.get(username);
if ( u == null ) {
u = UserDaoFactory.getUserDao().loadUser(username);
userCache.put(username, u);
}
return u;
}
}
The next step is to create our MBean Interface. JMX Spec says that an MBean interface and implementation should follow a strict naming convention where the MBean interface is the managed objects name followed by MBean. So for our example, we are going to create a Manager object called UserCacheManager, thus, our interface shall be named (in the same package) UserCacheManagerMBean.
Let's take a look at our MBean interface.
UserCacheManagerMBean.java
package mbeans;
public interface UserCacheManagerMBean {
void invalidateCache();
}
Pretty simple and straight forward if I had to say so myself. Next we will create our management object, the UserCacheManager.
package mbeans;
public class UserCacheManager implements UserCacheManagerMBean {
private UserCache managedCache;
public UserCacheManager(UserCache managedCache) {
this.managedCache = managedCache;
}
public void invalidateCache() {
managedCache.userCache = new HashMap();
}
}
So far, so good. The only thing we have left to do is register it with our MBeanServer. If you are running an application server, you can just about guarantee that one has already been created and has a ton of beans registered in it already. Bearing that in mind, let's count on that for the purpose of this example and register our mbean with the default server.
This next class will act as the Factory object for getting a UserCache, and will be the point of entry from our entire program, so we will let it handle the registering of the management object.
UserCacheFactory.java
package mbeans;
public class UserCacheFactory {
private static final UserCacheFactory instance = new UserCacheFactory();
public static UserCacheFactory getInstance() { return instance; }
private UserCache userCache;
public synchronized UserCache getUserCache() {
if ( userCache == null ) {
userCache = new UserCache();
registerManagedCache();
}
return userCache;
}
private registerManagedCache() {
MBeanServer server = null;
ListinitialServers = MBeanServerFactory.findMBeanServer(null);
if ( initialServers != null && initialServers.size() > 0 ) {
server = initialServers.get(0);
}
if ( server == null ) {
server = MBeanServerFactory.createMBeanServer();
}
try {
ObjectName mbeanName = new ObjectName("Application:impl=UserCacheManager");
server.registerMBean( managedCache, mbeanName );
catch (Throwable t) {
t.printStackTrace();
}
}
Let's look at what we are doing in the above code.
First off, we made it a little more threadsafe with a singleton factory that has a synchronized getUserCache() method.
Second, if we are constructing a new UserCache when we access it from the factory, we also create the manager bean for the class at the same time and register it with the MBeanServer.
Let's take a second to talk about this mysterious new ObjectName object that I am creating for this MBean.
ObjectName is an object that represents the namespace that a bean will live in once it is registered with the MBeanServer. This is how any clients or agents will know of this MBean. The namespacing is made up of two main parts, delimited by a colon. The first part is the Domain in which the MBean lives - This should be somewhat specific to your application. The second part of the ObjectName string is a set of key=val pairs delimited by commas. For example, key1=val1,key2=val2.
If you were to browse in jConsole for our new UserCacheManager, you would find it listed under:
+ Application
+ UserCacheManager
If you were to lookup the MBean by it's name you would look it up using the same string that we passed into the ObjectName constructor.
Voila! We have created our first custom MBean and have registered it with our default MBeanServer instance. If you run this code in your App Server, you should be able to connect to the server's JMX port with jConsole and see the manager object appear in the MBeans tab shortly after you access the method UserCacheFactory.getInstance().getUserCache() for the first time in your application.
Stay tuned in Act 3 as we explore the different types of MBeans that you can create aside from the above demonstrated standard MBean. We will look at MXBeans, Open MBeans, Dynamic MBeans, and Model MBeans; see examples of each; and determine the positives and negatives of using each type.
6.14.2009
JMX Management for fun and profit!
I have recently been tasked with the project at work to start implementing JMX to manage our production app servers. Prior to this I had heard of JMX and knew a little about what it was and it's overall purpose, but there was a great deal that I didn't understand going into the project and over the last few monthes I have learned a great deal, not only about JMX but also about Management Architecture and Systems in general.
We have, what I would refer to, as a fairly common production system architecture. We have multiple applications that run over multiple clusters of tomcat servers. Some of our production servers run clusters of tomcat servers themselves as well, which presents a whole new layer of problems.
Basically, step 1 was to implement JMX and custom MBeans into our application. I started with some low hanging fruit. Our local application cache has always been a problem child so it was a good place to 'test the waters' and see what JMX could do for us.
I should also note that this is still an ongoing project so I will also be periodically updating this blog with new things that I implement as the project matures and eventually becomes a reality.
The first step in this long ardous journey was to get JMX up and running in our app servers so that we could connect through JConsole and get access to our MBeans as we created them.
A little googling showed that this is pretty simple to do with just some command line parameters passed into the JVM.
-Dcom.sun.management.jmxremote
Setting this system property registers the JVM Instrumentation MBeans and publishes an RMI connector to allow JMX Client applications to connect to it from the same physical machine.
Close, but not quite there, I want to be to connect to the server from a different machine. So there are a few more things I need to set in the JVM to allow this.
-Dcom.sun.management.jmxremote.port=
-Dcom.sub.management.jmxremote.authenticate=false
Nice! Setting this property exposes the RMI Service on so you can connect to it remotely.
Now I can connect remotely, however, anyone else with JConsole can also connect remotely and do lots of nastiness to the server. We don't want that!
-Dcom.sun.management.jmxremote.password.file=
-Dcom.sun.management.jmxremote.access.file=
As you can probably guess, this sets the path's to a couple of files that the RMI Service will use to authenticate connections. An example of each is provided with the JDK in $JRE_HOME/lib/management.
NOTE: There is a security risk in using this form of authentication for JMX. The username and password values are sent to the server in cleartext, thus anyone sniffing packets could intercept the authentication credentials and connect to your server. That being said, if your servers are behind a firewall and you are only connecting from inside your corporate network, this may not be an issue for you, however, the safest way to go would be to use SSL Authentication with certificates to authenticate connections to your RMI Service. I will cover using SSL Certificates in a subsequent blog entry.
Awesome! I can now connect to my JMX Managed Tomcat Server from another machine and do all kinds of fun stuff to my JVM and Tomcat instance.
To connect, I simply open JConsole from a PC that has the JDK installed on it and open a connection to my remote server. In Java 6 (which has a far better version of jconsole) select Remote Process and the address and port of the server you want to connect to, as well as the username and password specified in your jmxremote.passwd file.
For example:
Remote Process: 192.168.0.100:9100 Username: Manager Password: s3cr3t
Viola!
In my next post I will cover creating custom MBeans and different things you can do with your own MBeans to make managing your small to enterprise application easy and fun!
We have, what I would refer to, as a fairly common production system architecture. We have multiple applications that run over multiple clusters of tomcat servers. Some of our production servers run clusters of tomcat servers themselves as well, which presents a whole new layer of problems.
Basically, step 1 was to implement JMX and custom MBeans into our application. I started with some low hanging fruit. Our local application cache has always been a problem child so it was a good place to 'test the waters' and see what JMX could do for us.
I should also note that this is still an ongoing project so I will also be periodically updating this blog with new things that I implement as the project matures and eventually becomes a reality.
The first step in this long ardous journey was to get JMX up and running in our app servers so that we could connect through JConsole and get access to our MBeans as we created them.
A little googling showed that this is pretty simple to do with just some command line parameters passed into the JVM.
-Dcom.sun.management.jmxremote
Setting this system property registers the JVM Instrumentation MBeans and publishes an RMI connector to allow JMX Client applications to connect to it from the same physical machine.
Close, but not quite there, I want to be to connect to the server from a different machine. So there are a few more things I need to set in the JVM to allow this.
-Dcom.sun.management.jmxremote.port=
-Dcom.sub.management.jmxremote.authenticate=false
Nice! Setting this property exposes the RMI Service on
Now I can connect remotely, however, anyone else with JConsole can also connect remotely and do lots of nastiness to the server. We don't want that!
-Dcom.sun.management.jmxremote.password.file=
-Dcom.sun.management.jmxremote.access.file=
As you can probably guess, this sets the path's to a couple of files that the RMI Service will use to authenticate connections. An example of each is provided with the JDK in $JRE_HOME/lib/management.
NOTE: There is a security risk in using this form of authentication for JMX. The username and password values are sent to the server in cleartext, thus anyone sniffing packets could intercept the authentication credentials and connect to your server. That being said, if your servers are behind a firewall and you are only connecting from inside your corporate network, this may not be an issue for you, however, the safest way to go would be to use SSL Authentication with certificates to authenticate connections to your RMI Service. I will cover using SSL Certificates in a subsequent blog entry.
Awesome! I can now connect to my JMX Managed Tomcat Server from another machine and do all kinds of fun stuff to my JVM and Tomcat instance.
To connect, I simply open JConsole from a PC that has the JDK installed on it and open a connection to my remote server. In Java 6 (which has a far better version of jconsole) select Remote Process and the address and port of the server you want to connect to, as well as the username and password specified in your jmxremote.passwd file.
For example:
Remote Process: 192.168.0.100:9100 Username: Manager Password: s3cr3t
Viola!
In my next post I will cover creating custom MBeans and different things you can do with your own MBeans to make managing your small to enterprise application easy and fun!
Labels:
java,
jconsole,
jmx,
management
5.30.2009
SQL Injection... Still? Really?
When it comes to development blunders there is one thing that really just rubs me the wrong way, no matter what language I see it in. SQL Injection is that one thing. I can understand XSS bugs, DOS vulnerabilities, buffer and stack overflows, information disclosure bugs, and just about any other type of application vulnerability that you can think of, but really there is just no excuse for the most basic SQL injection bugs to exist anymore.
If you do a google search for SQL Injection, you will see 'Results 1 - 10 of about 2,990,000 for sql injection'
Yes, you did read that correctly.. That really does say 2,990,000 results for SQL Injection. Seeing that should raise a big red flag in your minds eye, a big red flag that says, "Why, oh why is this still an issue?"
Code Injection as a generalized vulnerability is the act of embedding executable code into an application thereby forcing that application to perform an action specified by the user. This is a pretty broad description and really describes anything from a BOF (that's buffer overflow for you non-l33t types) to XSS (Cross Site Scripting) to my favorite SQL Injection.
Let me digress for a second.
The vast majority of websites on the internet are run by one guy in his spare time. Not every site has a Development Team full of Architects, Designers, Security Analysts, QA Analysts, and Database Administrators. That one guy acts as all of those people, and more often than not, doesn't look very good with any of those hats on. That one guy also relies on the 'frameworks' he downloads off of the interwebz to take care of all that tedious difficult stuff for him. Keep that in mind as we move forward along the tracks of my train of thought.
SQL Injection differs from most other types of code injection in that it has the ability to change the persistent state of an application or a site. There are some other forms of code injection that can do the same thing either by themselves or by accessing the persistent storage used by the application in the injected code, but every Skiddie in the world knows about ' or 1=1. Thus, if your app has this:
$badSQL = "Select * from users where username='$_GET['username']' and password='$_GET['password']'"
You end up with:
select * from users where username='Administrator' and password='' or 1=1
Go ahead now and type that into your MySQL Client and see what that returns... Whoops.. If you don't have a MySQL client, well, you'll just have to take my word for it that bad things follow.
I could spend all day (or night in this case) discussing all the different types of SQL Injection attacks, and all the proper ways to combat them; fortunately I don't have to because 2,990,000 pages have already gone down that path according to the mighty Google. That is beyond the scope of this article. What I want to talk about is why this is still happening. Why do we still see hundreds of new sites and apps each year falling victim to something that is so widely known, so easy to take care of in almost any language?
The answer is simple, and is the answer to most questions about things dealing with Application Security. The answer, my friends, is pure and simple HUMAN LAZINESS.
To put it in the simplest possible way, developers, and I use the term loosely, are either too lazy to care about fixing this simple issue, or too lazy to learn that it is an issue.
Option 1: Too lazy to care
This covers a pretty broad scope of excuses and general B.S. that these self proclaimed developers will use. Everything from "I didn't think that my app would get enough exposure to attract hackers." to "There's nothing worth stealing in my application anyways."
For the too lazy to care breed, let me just tell you this. Skiddies love your small website that you built for your bowling league photo album. You know why they love it? They may be skiddies, but even they know that they can completely take over your site and you will never be any the wiser because you are the guy that would never, never, ever expect to have your site pwnt.
Option 2: Too lazy to learn
I think this covers the vast majority of the people who are guilty of belonging to this caste of developer, specifically in the web space. This is the guy that got a PHP for Dummies book for christmas from his wife's brother's dog's former owner, read the first two chapters, and starts up a project on SourceForge for the newest latest greatest web app in the world. You sir, have commited the 8th Deadly Sin - "Thou shalt not distribute code that sucks"
So there it is. This is why SQL Injection has been and is still on the OWASP Top Ten List since 2004 when it was incepted.
In conclusion, I challenge you, blog-o-sphere, developers, humans - to write your next batch of code with input filtering and proper validation, or using parameterized queries (which I believe are available in almost every major interpreted or server-side language there is by now).
I challenge you to get this off of the OWASP Top Ten List for 2010.
I challenge you to overcome the laziness and put time and effort into learning your craft, or walk away and leave it to the people that do. There are plenty of ways you can get your information onto the tubez without ever having to write a single line of code.
While you're at it, I challenge you to share this post with your friends who may or may not fall into one of these categories, your friends who will read this and nod their head in agreement, and your friends who simply like to slashdot and facebook share every single page they come acrossed.
Until our next blog-o-spheric confrontation - may the schwartz be with you, farewell, and thanks for all the fish!
If you do a google search for SQL Injection, you will see 'Results 1 - 10 of about 2,990,000 for sql injection'
Yes, you did read that correctly.. That really does say 2,990,000 results for SQL Injection. Seeing that should raise a big red flag in your minds eye, a big red flag that says, "Why, oh why is this still an issue?"
Code Injection as a generalized vulnerability is the act of embedding executable code into an application thereby forcing that application to perform an action specified by the user. This is a pretty broad description and really describes anything from a BOF (that's buffer overflow for you non-l33t types) to XSS (Cross Site Scripting) to my favorite SQL Injection.
Let me digress for a second.
The vast majority of websites on the internet are run by one guy in his spare time. Not every site has a Development Team full of Architects, Designers, Security Analysts, QA Analysts, and Database Administrators. That one guy acts as all of those people, and more often than not, doesn't look very good with any of those hats on. That one guy also relies on the 'frameworks' he downloads off of the interwebz to take care of all that tedious difficult stuff for him. Keep that in mind as we move forward along the tracks of my train of thought.
SQL Injection differs from most other types of code injection in that it has the ability to change the persistent state of an application or a site. There are some other forms of code injection that can do the same thing either by themselves or by accessing the persistent storage used by the application in the injected code, but every Skiddie in the world knows about ' or 1=1. Thus, if your app has this:
$badSQL = "Select * from users where username='$_GET['username']' and password='$_GET['password']'"
You end up with:
select * from users where username='Administrator' and password='' or 1=1
Go ahead now and type that into your MySQL Client and see what that returns... Whoops.. If you don't have a MySQL client, well, you'll just have to take my word for it that bad things follow.
I could spend all day (or night in this case) discussing all the different types of SQL Injection attacks, and all the proper ways to combat them; fortunately I don't have to because 2,990,000 pages have already gone down that path according to the mighty Google. That is beyond the scope of this article. What I want to talk about is why this is still happening. Why do we still see hundreds of new sites and apps each year falling victim to something that is so widely known, so easy to take care of in almost any language?
The answer is simple, and is the answer to most questions about things dealing with Application Security. The answer, my friends, is pure and simple HUMAN LAZINESS.
To put it in the simplest possible way, developers, and I use the term loosely, are either too lazy to care about fixing this simple issue, or too lazy to learn that it is an issue.
Option 1: Too lazy to care
This covers a pretty broad scope of excuses and general B.S. that these self proclaimed developers will use. Everything from "I didn't think that my app would get enough exposure to attract hackers." to "There's nothing worth stealing in my application anyways."
For the too lazy to care breed, let me just tell you this. Skiddies love your small website that you built for your bowling league photo album. You know why they love it? They may be skiddies, but even they know that they can completely take over your site and you will never be any the wiser because you are the guy that would never, never, ever expect to have your site pwnt.
Option 2: Too lazy to learn
I think this covers the vast majority of the people who are guilty of belonging to this caste of developer, specifically in the web space. This is the guy that got a PHP for Dummies book for christmas from his wife's brother's dog's former owner, read the first two chapters, and starts up a project on SourceForge for the newest latest greatest web app in the world. You sir, have commited the 8th Deadly Sin - "Thou shalt not distribute code that sucks"
So there it is. This is why SQL Injection has been and is still on the OWASP Top Ten List since 2004 when it was incepted.
In conclusion, I challenge you, blog-o-sphere, developers, humans - to write your next batch of code with input filtering and proper validation, or using parameterized queries (which I believe are available in almost every major interpreted or server-side language there is by now).
I challenge you to get this off of the OWASP Top Ten List for 2010.
I challenge you to overcome the laziness and put time and effort into learning your craft, or walk away and leave it to the people that do. There are plenty of ways you can get your information onto the tubez without ever having to write a single line of code.
While you're at it, I challenge you to share this post with your friends who may or may not fall into one of these categories, your friends who will read this and nod their head in agreement, and your friends who simply like to slashdot and facebook share every single page they come acrossed.
Until our next blog-o-spheric confrontation - may the schwartz be with you, farewell, and thanks for all the fish!
Labels:
development,
security,
sql injection
Subscribe to:
Posts (Atom)