Show HN: I wrote a Java decompiler in pure C language

2025-06-0312:1417296github.com

Java decompiler written in C. Contribute to neocanable/garlic development by creating an account on GitHub.

You can’t perform that action at this time.


Read the original article

Comments

  • By gibibit 2025-06-0315:276 reply

    I am always curious how different C programs decide how to manage memory.

    In this case there are is a custom string library. Functions returned owned heap-allocated strings.

    However, I think there's a problem where static strings are used interchangably with heap-allocated strings, such as in the function `string class_simple_name(string full)` ( https://github.com/neocanable/garlic/blob/72357ddbcffdb75641... )

    Sometimes it returns a static string like `g_str_int` and sometimes a newly heap-allocated string, such as returned by `class_type_array_name(g_str_int, depth)`.

    Callers have no way to properly release the memory allocated by this function.

    • By neocanable 2025-06-0315:34

      In multi-threaded mode, each thread will create a separate memory pool. If in single-threaded mode, a global memory pool is used. You can refer to https://github.com/neocanable/garlic/blob/72357ddbcffdb75641.... The x_alloc and x_alloc_in in it indicate where the memory is allocated. When each task ends, the memory allocated in the memory pool is released, and the cycle repeats.

    • By norir 2025-06-0320:211 reply

      Many command line tools do not need memory management at all, at least to first approximation. Free nothing and let the os cleanup on process exit. Most libraries can either use an arena internally and copy any values that get returned to the user to the heap at boundaries or require the user to externally create and destroy the arena. This can be made ergonomic with one macro that injects an arena argument into function defs and another that replaces malloc by bumping the local arena data pointer that the prior macro injected.

      • By 1718627440 2025-06-0320:422 reply

        That might be true, but leaking is neither the critical nor the most hard to find memory management issue, and good luck trying to adapt or even run valgrind with a codebase that mindlessly allocates and leaks everywhere.

        • By kevin_thibedeau 2025-06-0323:40

          Shhh. We want the ML models trained on this sort of deeply flawed code.

        • By guerrilla 2025-06-0321:001 reply

          Pretty sure you can just disable leak checking.

          • By 1718627440 2025-06-0321:041 reply

            But for example verifying that memory is not touched after it is supposed to, is much harder when you can't rely on it being freed.

            Of course literally running valgrind is still possible, but it is difficult to get useful information.

            • By nick__m 2025-06-0321:401 reply

              You cannot have use-after-free if you never call free, so there are no points at which memory should not be touched.

              That's the beauty of the never free memory management strategy.

              • By dajtxx 2025-06-0322:401 reply

                It can still be a bug if you use something after you would have freed it because your code isn't meant to be using that object any more. It points to errors in the logic.

                • By guerrilla 2025-06-0417:12

                  Agreed. I think being methodical is better here for sure.

    • By IshKebab 2025-06-0316:445 reply

      Interesting. Someone should come up with a language that prevents these sorts of mistakes!

      • By cenamus 2025-06-0317:35

        Thank god Lisp is older than C, don't have to deal with such nonsense :-)

      • By brabel 2025-06-0316:501 reply

        That’s impossible. Just be more careful and everything should work, the author’s C was just a bit rusty!

        • By neocanable 2025-06-0316:561 reply

          This project is my first project written in C language. Before this, my C language level was only at printf("hello world"). I am very happy because this project made me dare to use secondary pointers.

          • By sim7c00 2025-06-0318:51

            u did really well ppl like to pick on C. :) thanks for making it in C, fun to read ur code and see how others go about this language!

      • By kookamamie 2025-06-0316:51

        Yes, perhaps it could have a marketing slogan like "Write once, crash everywhere!"

      • By uecker 2025-06-0319:54

        I think he is using memory pools, so this is ok.

      • By pjmlp 2025-06-0318:52

        If only there were a couple of OSes implementated during the 1960's with such programming languages....

    • By kazinator 2025-06-042:38

      In the same file:

        static bool is_java_identifier_start(char c)
        {
          return (isalpha(c) || c == '_' || c == '$');
        }
      
      Undefined behavior in isalpha if c happens to be negative (and not equal to EOF), like some UTF-8 byte.

      I think some <ctype.h> implementations are hardened against this issue, but not all.

    • By masfoobar 2025-06-047:31

      > I am always curious how different C programs decide how to manage memory.

      At a basic level, you can create memory on the stack or on the heap. Obviously I will focus on the heap as that is dynamically allocating memory of a certain size.

      The C programming language does not force you how to handle memory. You are pretty much on your own. For some C programmers (and likely more inexperienced ones) they will malloc individual variables like they are creating a 'new' instance in a typical OOP language like Java. This can be a telltale sign of a programmer working with C that comes from an OOP background. As they learn and improve on their C skills they realise they should create a chunk of memory of a certain type, but could still be malloc(ing) and free(ing) all over the code, making it difficult to understand what is being used and where -- especially if you are looking at code you did not write.

      You can also have programs that do not bother free(ing) memory. For example, a simple shell program that just does simple input->process->output and terminates. For these types of programs, just let the OS deal with freeing the memory.

      Good C code (in my opinion) uses malloc and free in only a handful of functions. There are higher level functions for proper Allocators. One example is an Arena Allocator. Then if you want a function which may require dynamic memory, you can tell it which allocator to use. It gives you control, generally speaking. You can create a simple string library or builder with an allocator.

      Of course an Allocator does not have to use memory on the heap. It can still use on the stack as well.

      There are various other patterns to use in the world of memory, especially in C.

    • By SunlitCat 2025-06-0319:03

      Strings! The bane of C programming, and a big reason I prefer C++. :D

  • By jbellis 2025-06-0322:231 reply

    I don't think it's available in a standalone repo but it IS available as a standalone library, IntelliJ's FernFlower decompiler is the gold standard https://github.com/JetBrains/intellij-community/blob/master/... https://www.jetbrains.com/intellij-repository/releases

    I guess there's some history there that I'm not familiar with because JBoss also has a FernFlower decompiler library https://mvnrepository.com/artifact/org.jboss.windup.decompil...

  • By appendixv3 2025-06-0312:446 reply

    Very cool project! Love the idea of a Java decompiler written in C — the speed must be great.

    Any plan to support `.dex` in the future? Also curious how you handle inner classes inside JARs.

    • By mdaniel 2025-06-0313:502 reply

      The "jikes" compiler from IBM <https://github.com/daveshields/jikespg> was written in C++ and was for the longest time screaming fast. It also had its own parser generator lpg which was fun to play with, if you're into those things <https://github.com/daveshields/jikespg>

      It seems someone liked it and made a "v2" along with LSP support https://github.com/A-LPG/LPG2#lpg2

    • By neocanable 2025-06-0313:502 reply

      I am writing the part of decompiling dex and apk. The current speed is about 10 times faster than that of Java, and it takes up less resources than Java. And the compiled binary is smaller, only about 300k. Thank you for your attention.

      • By Koshkin 2025-06-0323:081 reply

        > 10 times faster than that of Java

        I was hoping that these days' Java would be "almost" as fast C/C++. Oh well.

        • By neocanable 2025-06-042:36

          In the process of writing this, I learned a lot about JVM. JVM has done well enough, even surpassing C/C++ in some cases.

      • By mdaniel 2025-06-0314:282 reply

        This has been my life experience with things written in C/C++, so speed doesn't matter. Or, I guess from an alternative perspective, it ran very fast, but exited very fast, too :-D

          $ ./objdir/garlic $the_jar_file -o out-dir -t $(nproc)
          Progress : 85 (1024)Segmentation fault: 11

        • By neocanable 2025-06-0314:49

          Sorry for giving you a bad experience. Please provide the jar file or class file. I hope I can fix it as soon as possible.

        • By uecker 2025-06-0319:591 reply

          Is it? This is my experience with Python. The C/C++ programs I use daily never seem to crash (Linux, bash, terminals, X, firefox, vim, etc.). It must be years ago one of those programs crashed while I used it.

          • By 1718627440 2025-06-0320:461 reply

            Also a segfault IS the protection layer intervening, it is equivalent to a exception in other languages. The real problem is, when there is no segfault.

            • By uecker 2025-06-0321:37

              This is absolutely true. But even this does not happen in the software I use every day. Software written is C is definitely the most stable I use - by far. That there are people running around claiming that it is impossible to write stable software in C and it crashes all the time due to bugs is rather unfortunate, as it is far from the truth.

    • By tslater2006 2025-06-0312:56

      The readme shows support for dumping dex files. Edit: missed that it has a comment that stays "unsupport for now" but at least it looks like something planned

    • By neocanable 2025-06-0313:53

      It is processes inner classes recursively. First read all entry from jar, and analyze the relationships between classes. Then do some decompile job.

    • By neocanable 2025-06-1315:09

      the project support dex and apk now.

HackerNews