2012-01-25

PCI Compliant Apache2 SSL Configuration

In recent trials, I have encountered a need to update the configuration for Apache to conform to PCI compliance. More information on Apache's configuration with SSL can be found here. The configuration you can use to have PCI-compliant SSL for Apache is:

SSLProtocol -ALL +SSLv3 +TLSv1
SSLCipherSuite HIGH:MEDIUM:+TLSv1:!SSLv2:+SSLv3:!ADH

Breakdown:
"HIGH:MEDIUM": This will enable Apache for the seven strongest algorithms used for encryption.
"+TLSv1": Enable TLS version 1 encryption. "!SSLv2": Disable the weaker SSL version 2 encryption. "+SSLv3": Enable the stronger SSL version 3 encryption. "!ADH": Disable the ADH versions of encryption, which are weaker algorithms.

2012-01-21

Malware Signature Generation - Mid Trial

In recent research, I've discovered a few things about malware signature generation (MSG) and the whole model that surrounds it. Most of this is just speculation, which would explain the lack of citations. However, I
would like to expand on what we have and create a smarter product.

As I understand it, MSG is based on just understanding exploits that have been created and basically blacklisting and whitelisting code that has already been written. The problem is there is an infinite number of ways a task can be completed, so having a complete and inclusive list (or even a list up to date with the most recent hacks) is nearly impossible. With the plethora of technologies involved in a single web page request, that probability of having a fully inclusive list of exploits is even more stark.

Last week, I had a theory that if one were to compile source code to bytecode or binary, then you could inspect the result of that to determine if similar plaintext code would have the same binary result once compiled. I toyed around with the idea by creating two javascript files with the same code, except a few lines were re-arranged. The function was still the same, but the order in which some actions took place was different. The binary result was different. I tried compiling it with exactly the same function, but the name of a variable was different. Just like before, the .class files were different. So using Rhino to compile JavaScript isn't proving to be a consistent method of identifying bytecode signatures.

I think a less kludge-like method of identifying malware would be to parse either the binary, bytecode or source code of the malicious scripts and heuristically identify what the code is trying to do. If the code is obfuscated over a few layers, makes requests for unwarranted remote resources, extends into other languages to fetch unwarranted remote resources, and/or attempts to download files to your computer without your consent or knowledge, then it would be classified as something at least suspicious and marked for later review. Then, once we find that logic signature again, we can disinfect it in some fashion or notify the end-user.

I will have to explore more about lexical parsers, how they work and what data I can see with them to understand if this theory holds. If this pans out, I'm sure other big companies like this are already implementing this, which is why I'm not fearful about putting this idea out into the wild.

2012-01-13

Malware Signature Generation

In recent work, I've encountered a task where we are identifying malware based on a signature that is a snippet of code that performs the malicious activity. These are usually JavaScript- or PHP-based exploits that disable any local protections and transfer whatever data or payload to or from the client for further exploitation. The question is how to identify these snippets of code so we can take action to have it removed. I think it would not be the most efficient way to copy/paste the code into a database and check to see if you ever find that snippet again because the variable names could be changed, lines of code could be shifted around or even a different character set in the files would pretty much invalidate the signature on the last one found just like it. One higher priority question to me is: Will compiling PHP and JavaScript to bytecode, and generating signatures for the binary results be a more effective way of identifying malware than identifying copy/pasted snippets with no heuristic determination? For example, if I were to compile JS to Java bytecode using Rhino, like they say it's possible, then would it be more effective for my scripts to analyze the compiled results and build signatures based on that instead.

I'm not sure how a compiler works on the inside and how my algebraic instructions end up as binary data, nor do I know what the binary data means as far as how instructions are interpreted by the OS, so I'm looking to open a discussion over this to those who know about these kinds of things.