Malware Signature Generation

In recent work, I've encountered a task where we are identifying malware based on a signature that is a snippet of code that performs the malicious activity. These are usually JavaScript- or PHP-based exploits that disable any local protections and transfer whatever data or payload to or from the client for further exploitation. The question is how to identify these snippets of code so we can take action to have it removed. I think it would not be the most efficient way to copy/paste the code into a database and check to see if you ever find that snippet again because the variable names could be changed, lines of code could be shifted around or even a different character set in the files would pretty much invalidate the signature on the last one found just like it. One higher priority question to me is: Will compiling PHP and JavaScript to bytecode, and generating signatures for the binary results be a more effective way of identifying malware than identifying copy/pasted snippets with no heuristic determination? For example, if I were to compile JS to Java bytecode using Rhino, like they say it's possible, then would it be more effective for my scripts to analyze the compiled results and build signatures based on that instead.

I'm not sure how a compiler works on the inside and how my algebraic instructions end up as binary data, nor do I know what the binary data means as far as how instructions are interpreted by the OS, so I'm looking to open a discussion over this to those who know about these kinds of things.

No comments:

Post a Comment