LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani

Recommendation Score

breakthrough🔴 AdvancedNLP LLM ReasoningSystemUseful for both

Research context

Primary field

NLP

Language understanding, generation, extraction, and evaluation.

Topics

LLM Reasoning

Paper type

System

Best for

Useful for both

arXiv categories

cs.CRcs.AIcs.CR

Why It Matters

LLM4CodeRE adapts LLMs specifically for malware decompilation, significantly improving reverse engineering accuracy on obfuscated code—critical for automated threat analysis in cybersecurity operations.

Abstract

Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most existing approaches rely on generic code pretraining and lack adaptation to malicious software. We propose LLM4CodeRE, a domain-adaptive LLM framework for bidirectional code reverse engineering that supports both assembly-to-source decompilation and source-to-assembly translation within a unified model. To enable effective task adaptation, we introduce two complementary fine-tuning strategies: (i) a Multi-Adapter approach for task-specific syntactic and semantic alignment, and (ii) a Seq2Seq Unified approach using task-conditioned prefixes to enforce end-to-end generation constraints. Experimental results demonstrate that LLM4CodeRE outperforms existing decompilation tools and general-purpose code models, achieving robust bidirectional generalization.

More in NLP → More on LLM Reasoning →

View on arXiv → Download PDF →

Published April 7, 2026