Compilation and decompilation of Java code

Compilation and decompilation of Java code

Programming language

Before introducing compilation and decompilation, let's briefly introduce the programming language (Programming Language). Programming language (Programming Language) is divided into low-level language (Low-level Language) and high-level language (High-level Language).

Machine language (Machine Language) and assembly language (Assembly Language) are low-level languages, and programs are written directly with computer instructions.

And C, C++, Java, Python, etc. belong to high-level languages. Programs are written with statements, which are abstract representations of computer instructions.

For example, the same statement is expressed in C language, assembly language and machine language as follows:

Computers can only perform operations on numbers. Symbols, sounds, and images must be represented by numbers inside the computer. Instructions are no exception. The machine language in the above table is entirely composed of hexadecimal numbers. The earliest programmers used machine language to program directly, but it was very troublesome. It was necessary to check a large number of tables to determine what each number means. The program written was very unintuitive and error-prone. So with assembly language, the machine A group of numbers in the language are represented by mnemonics. Write the assembler directly with these mnemonics, and then let the assembler (Assembler) look up the table and replace the mnemonics with numbers. Language is translated into machine language.

However, assembly language is also more complicated to use. Later, high-level languages such as Java, C, and C++ were derived.

What is compiling

There are two languages mentioned above, one low-level language and one high-level language. It can be simply understood as follows: low-level languages are languages recognized by computers, and high-level languages are languages recognized by programmers.

So how to convert from a high-level language to a low-level language? This process is actually compilation.

It can also be seen from the above example that there is not a simple one-to-one correspondence between the C language statement and the low-level language instruction. A a=b+1statement needs to be translated into three assembly or machine instructions. This process is called Compile. Compiler (Compiler) to complete, obviously the function of the compiler is much more complicated than the assembler. A program written in C language must be compiled into machine instructions before it can be executed by the computer. Compilation takes some time. This is a disadvantage of programming in a high-level language, but it has more advantages. First of all, programming in C language is easier, the code written is more compact, more readable, and errors are easier to correct.

The process of translating a source code program written in a high-level computer language that is easy for people to write, read, and maintain into a low-level machine language program that a computer can interpret and run is compilation. The tool responsible for the processing of this process is called a compiler

Now we know what compilation is and what a compiler is. Different languages have their own compilers. The compiler responsible for compiling in the Java language is a command:javac

javac is the Java language compiler included in the JDK. This tool can compile source files with the suffix .java into bytecodes with the suffix .class that can be run on the Java virtual machine.

When we finish writing a HelloWorld.javafile, we can use javac HelloWorld.javacommands to generate the HelloWorld.classfile. This classtype of file is a file that the JVM can recognize. We usually think of this process as the compilation of the Java language. In fact, the classfile is still not a language that the machine can recognize, because the machine can only recognize the machine language, and the JVM needs to convert the classbytecode of this file type into the machine language that the machine can recognize.

What is decompilation

The process of decompilation is just the opposite of compilation, which is to restore the compiled programming language to an uncompiled state, that is, to find out the source code of the programming language. It is to convert a language that the machine can understand into a language that the programmer can understand. Decompilation in the Java language generally refers to converting classfiles into javafiles.

With a decompilation tool, we can do many things. The main function is that with a decompilation tool, we can read and understand the bytecode generated by the Java compiler. If you want to ask what is the use of reading bytecode, then I can tell you very responsibly, the benefits are great. For example, several typical principle articles in my blog post are obtained by analyzing the code after decompilation through the decompilation tool. Such as in-depth understanding of multithreading (1)-Synchronized realization principle, in-depth analysis of Java enumeration types-enumeration thread safety and serialization issues, Java Switch for integer, character, string type Specific implementation details, type erasure in Java, etc. I recently wrote an article on Java syntactic sugar on GitChat, most of which used decompiler tools to gain insight into the principles behind syntactic sugar.

Java decompiler tool

This article mainly introduces three Java decompilation tools: javap , jad and cfr

javap

javapIt is a tool that comes with jdk, which can decompile the code and view the bytecode generated by the java compiler. javapThe biggest difference from the other two decompilation tools is that the files they generate are not javafiles, and they are not as easy to understand as the codes generated by the other two tools. Take a simple code example. If we want to analyze switchhow Java 7 is supported String, we first have the following source code that can be compiled:

public class switchDemoString {
    public static void main(String[] args) {
        String str = "world";
        switch (str) {
            case "hello":
                System.out.println("hello");
                break;
            case "world":
                System.out.println("world");
                break;
            default:
                break;
        }
    }
} 

Execute the following two commands:

javac switchDemoString.java
javap -c switchDemoString.class 

The generated code is as follows:

public class com.hollis.suguar.switchDemoString {
  public com.hollis.suguar.switchDemoString();
    Code:
       0: aload_0
       1: invokespecial #1                 //Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                 //String world
       2: astore_1
       3: aload_1
       4: astore_2
       5: iconst_m1
       6: istore_3
       7: aload_2
       8: invokevirtual #3                 //Method java/lang/String.hashCode:()I
      11: lookupswitch  {//2
              99162322: 36
             113318802: 50
               default: 61
          }
      36: aload_2
      37: ldc           #4                 //String hello
      39: invokevirtual #5                 //Method java/lang/String.equals:(Ljava/lang/Object;)Z
      42: ifeq          61
      45: iconst_0
      46: istore_3
      47: goto          61
      50: aload_2
      51: ldc           #2                 //String world
      53: invokevirtual #5                 //Method java/lang/String.equals:(Ljava/lang/Object;)Z
      56: ifeq          61
      59: iconst_1
      60: istore_3
      61: iload_3
      62: lookupswitch  {//2
                     0: 88
                     1: 99
               default: 110
          }
      88: getstatic     #6                 //Field java/lang/System.out:Ljava/io/PrintStream;
      91: ldc           #4                 //String hello
      93: invokevirtual #7                 //Method java/io/PrintStream.println:(Ljava/lang/String;)V
      96: goto          110
      99: getstatic     #6                 //Field java/lang/System.out:Ljava/io/PrintStream;
     102: ldc           #2                 //String world
     104: invokevirtual #7                 //Method java/io/PrintStream.println:(Ljava/lang/String;)V
     107: goto          110
     110: return
} 

My personal understanding is that instead of javapdecompiling the bytecode into a javafile, it generates a bytecode that we can understand. In fact, the files generated by javap are still bytecodes, but programmers can understand a little bit. If you have a good grasp of bytecode, you can still understand the above code. In fact, it is to convert String to hashcode, and then compare.

I personally think that under normal circumstances, we will not use javapcommands much, generally only when we really need to look at the bytecode. But what is exposed in the bytecode is the most complete, and you must have the opportunity to use it. For example, I synchronizedused it when I analyzed the principle javap. By javapbytecode generation, I found synchronizedthe bottom relied ACC_SYNCHRONIZEDnumerals and monitorenter, monitorexittwo instructions to achieve synchronization.

jad

jad is a relatively good decompilation tool, as long as you download an execution tool, you can classdecompile the file. Still the above source code, after using jad decompilation, the content is as follows:

command:jad switchDemoString.class

public class switchDemoString
{
    public switchDemoString()
    {
    }
    public static void main(String args[])
    {
        String str = "world";
        String s;
        switch((s = str).hashCode())
        {
        default:
            break;
        case 99162322:
            if(s.equals("hello"))
                System.out.println("hello");
            break;
        case 113318802:
            if(s.equals("world"))
                System.out.println("world");
            break;
        }
    }
} 

Look, you must understand this code, because isn't this the standard java source code? This is very clear, you can see that the original string switch is implemented through the equals()and hashCode()method.

However, jad has not been updated for a long time. When decompiling the bytecode generated by Java7, there will occasionally be unsupported problems. When decompiling Java 8 lambda expressions, it completely fails.

CFR

Jad is very easy to use, but it hasn t been updated for a long time, so I can only replace him with a new tool. CFR is a good choice. Compared to jad, his syntax may be slightly more complicated, but it s good He can work.

For example, we use cfr to decompile the code just now. Execute the following command:

java -jar cfr_0_125.jar switchDemoString.class --decodestringswitch false 

Get the following code:

public class switchDemoString {
    public static void main(String[] arrstring) {
        String string;
        String string2 = string = "world";
        int n = -1;
        switch (string2.hashCode()) {
            case 99162322: {
                if (!string2.equals("hello")) break;
                n = 0;
                break;
            }
            case 113318802: {
                if (!string2.equals("world")) break;
                n = 1;
            }
        }
        switch (n) {
            case 0: {
                System.out.println("hello");
                break;
            }
            case 1: {
                System.out.println("world");
                break;
            }
        }
    }
} 

Through this code, we can also get the conclusion that the switch of the string is realized by the equals()and hashCode()method.

Compared with Jad, CFR has a lot of parameters, or just the code, if we use the following command, the output will be different:

java -jar cfr_0_125.jar switchDemoString.class

public class switchDemoString {
    public static void main(String[] arrstring) {
        String string;
        switch (string = "world") {
            case "hello": {
                System.out.println("hello");
                break;
            }
            case "world": {
                System.out.println("world");
                break;
            }
        }
    }
} 

So it --decodestringswitchmeans to decode the details of the switch support string. There are similar --decodeenumswitch, --decodefinally, --decodelambdasand so on. In my article on syntactic sugar, I used --decodelambdasthe decompilation of the lambda expression warning. Source code:

public static void main(String... args) {
    List<String> strList = ImmutableList.of("Hollis", " Hollis", " www.hollischuang.com");

    strList.forEach( s -> { System.out.println(s); } );
} 

java -jar cfr_0_125.jar lambdaDemo.class --decodelambdas falseCode after decompilation:

public static/* varargs */void main(String ... args) {
    ImmutableList strList = ImmutableList.of((Object)"Hollis", (Object)"\u516c\u4f17\u53f7\uff1aHollis", (Object)"\u535a\u5ba2\uff1awww.hollischuang.com");
    strList.forEach((Consumer<String>)LambdaMetafactory.metafactory(null, null, null, (Ljava/lang/Object;)V, lambda$main$0(java.lang.String ), (Ljava/lang/String;)V)());
}

private static/* synthetic */void lambda$main$0(String s) {
    System.out.println(s);
} 

There are many other parameters of CFR, which are used in different scenarios, and readers can use them java -jar cfr_0_125.jar --helpto understand. I will not introduce them one by one here.

How to prevent decompilation

Since we have tools to Classdecompile files, how to protect Java programs has become a very important challenge for developers. However, the magic is one foot high, and the road is one foot high. Of course, there are corresponding technologies that can deal with decompilation. However, I still want to point out that, like network security protection, no matter how much effort is made, it is actually only increasing the cost of the attacker. It cannot be completely prevented.

Typical coping strategies are as follows:

  • Isolate Java programs
    • Keep users out of your Class file
  • Encrypt the Class file
    • Mention the difficulty of cracking
  • Code obfuscation
    • Convert the code into a functionally equivalent, but difficult to read and understand form

Announcement Copyright Statement

(End of full text) Welcome to follow HollisChuang WeChat public account