前言

前面出了一篇教程,Ghidra Processor创建教程——从二进制到汇编代码,这篇教程是在此基础上面,增加语义解析的东西,从汇编代码翻译为伪代码

P-code

Ghidra P-Code是专为逆向工程设计的寄存器传输语言,能够对许多不同的处理器进行建模。

我们创建Ghidra Processor时,将二进制翻译为指令后,下一步想生成伪代码时,可以将指令定义为一系列的P-code指令,之后Ghidra会根据生成的P-code指令,生成伪C代码。

因此我们需要做的就是根据处理器的手册,将指令翻译为P-code。

这里给出P-code的表,里面给出具体的语法,以及对应生成的P-code

P-code Tables

定义mov指令P-code

首先来定义mov指令的P-code,mov指令有多种形式

:mov dl rn, rm is op=1; dl & inst_switch=0 ; rn ; rm {}
:mov dl rn, "bss"[imm64] is op=1; dl & inst_switch=1 ; rn;  imm64 {}
:mov dl "bss"[imm64], rn  is op=1; dl & inst_switch=2 ; imm64;  rn {}
:mov dl rn, "stack"[imm64] is op=1; dl & inst_switch=3 ; rn;  imm64 {}
:mov dl "stack"[imm64], rn  is op=1; dl & inst_switch=4 ; imm64;  rn {}
:mov dl rn, imm is op=1; dl & inst_switch=5 ; rn;  imm {}
:mov dl "bss"[rn], rm is op=1; dl & inst_switch=0xb ; rn; rm {}
:mov dl rn, "bss"[rm] is op=1; dl & inst_switch=0xc ; rn; rm {}
:mov dl "stack"[rn], rm is op=1; dl & inst_switch=0xd ; rn; rm {}
:mov dl rn, "stack"[rm] is op=1; dl & inst_switch=0xe ; rn; rm {}

首先定义第一个,这里为了便利,把所有复制以及其他操作都当做8byte的大小去运算,就不用重复定义很多情况

这里直接rn = rm即可,生成对应的P-code就是COPY

:mov dl rn, rm is op=1; dl & inst_switch=0 ; rn ; rm {rn = rm;}

Mov Bss/Stack指令

接下来定义第二个指令

:mov dl rn, "bss"[imm64] is op=1; dl & inst_switch=1 ; rn;  imm64 {}

这里涉及到bss段的数据读取,这里我们自定义一个P-code,用来表示根据偏移,获取一个bss段的指针

define pcodeop bss_;

然后来定义第二和第三个指令

:mov dl rn, "bss"[imm64] is op=1; dl & inst_switch=1 ; rn;  imm64 { local bss_addr = bss_(imm64); rn = *bss_addr; }
:mov dl "bss"[imm64], rn  is op=1; dl & inst_switch=2 ; imm64;  rn {local bss_addr = bss_(imm64); *bss_addr = rn;}

运行一下,发现报错了,报错主要的原因是下面这两个

ERROR qwbvm.sinc:59: qwbvm.sinc:59:    Main section: Could not resolve at least 1 variable size (SleighCompile)
ERROR qwbvm.sinc:60: qwbvm.sinc:60:    Main section: Could not resolve at least 1 variable size (SleighCompile)

在网上基本搜不到这个报错的解决办法,但是其实这个问题是因为ghidra不清楚imm64和bss_addr这两个变量的size,只要标记好对应的大小,ghidra就会编译通过

最终第二第三条指令定义如下

:mov dl rn, "bss"[imm64] is op=1; dl & inst_switch=1 ; rn;  imm64 { local bss_addr:8 = bss_(*[const]:8 imm64:8); rn = *bss_addr; }
:mov dl "bss"[imm64], rn  is op=1; dl & inst_switch=2 ; imm64;  rn {local bss_addr:8 = bss_(*[const]:8 imm64:8); *bss_addr = rn;}

我们可以运行一下,反汇编一下,右边的伪代码可以看到我们刚刚添加的指令P-code定义的成果,和bss相关的mov之类都已经被翻译成伪代码了

同理,我们可以把第四五条stack相关的指令的也定义出来

define pcodeop stack_;

:mov dl rn, "stack"[imm64] is op=1; dl & inst_switch=3 ; rn;  imm64 { local stack_addr:8 = stack_(*[const]:8 imm64:8); rn = *stack_addr;}
:mov dl "stack"[imm64], rn  is op=1; dl & inst_switch=4 ; imm64;  rn {local stack_addr:8 = stack_(*[const]:8 imm64:8); *stack_addr = rn;}

Mov imm指令

到了第六个指令,这个指令是将一个立即数赋值到寄存器,其中imm的长度是根据data_length的值而改变

:mov dl rn, imm is op=1; dl & inst_switch=5 ; rn;  imm {}

imm对应的定义是

imm: imm8 is addrmode=1 ; imm8 {}
imm: imm16 is addrmode=2 ; imm16 {}
imm: imm32 is addrmode=3 ; imm32 {}
imm: imm64 is addrmode=4 ; imm64 {}

对于这种形式的symbol,我们可以将对应的值export出去,例如imm对应的就是

imm: imm8 is addrmode=1 ; imm8 {export *[const]:8 imm8;}
imm: imm16 is addrmode=2 ; imm16 {export *[const]:8 imm16;}
imm: imm32 is addrmode=3 ; imm32 {export *[const]:8 imm32;}
imm: imm64 is addrmode=4 ; imm64 {export *[const]:8 imm64;}

而在对应的指令的语义定义部分,就可以直接使用imm

:mov dl rn, imm is op=1; dl & inst_switch=5 ; rn;  imm { rn = imm;}

如果此时运行一下,会发现和上面翻译出来的没什么区别,这个是因为syscall指令的语义还没定义,因此mov imm指令生成的pcode都被优化掉了

mov指令全部定义

剩下的mov指令和前面的大同小异,因此这里直接给出全部的mov指令定义

:mov dl "bss"[rn], rm is op=1; dl & inst_switch=0xb ; rn; rm {local bss_addr:8 = bss_(rn); *bss_addr = rm;}
:mov dl rn, "bss"[rm] is op=1; dl & inst_switch=0xc ; rn; rm {local bss_addr:8 = bss_(rm); rn = *bss_addr;}
:mov dl "stack"[rn], rm is op=1; dl & inst_switch=0xd ; rn; rm {local stack_addr:8 = stack_(rn); *stack_addr = rm;}
:mov dl rn, "stack"[rm] is op=1; dl & inst_switch=0xe ; rn; rm {local stack_addr:8 = stack_(rm); rn = *stack_addr;}

定义syscall指令P-code

syscall指令是根据r0,切换不同的功能,然后r1, r2, r3作为参数,r0为返回值,这里我们直接自定义一个P-code,然后调用这个P-code

define pcodeop syscall;

:syscall is op=0x20; inst_switch { r0 = syscall(r0, r1, r2, r3); }

这个时候运行一下,可以看到syscall和之前mov部分的指令语义都翻译出来了

定义算术指令P-code

这个是之前定义的算术指令

oprand: dl rn, rm  is dl & inst_switch=0; rn; rm {}
oprand: dl rn, imm is dl & inst_switch=5; rn; imm {}

:add oprand is op=2; oprand {}
:dec oprand is op=3; oprand {}
...
...

这样翻译为汇编指令还是可以的,但是要定义语义部分还是比较麻烦,因此我们先重构一下

重构

首先oprand前半部分都是dl rn,其实可以移回去,但是后半部分是rm还是imm是根据inst_switch,

如果oprand只是rm和imm部分,是无法获取当前的inst_switch,因此这里再添加一个context reg,switchmode

define context contextreg
    addrmode = (0,2)
    switchmode = (3,6)
;

然后在解析dl的部分添加上switchmode

dl: "" is data_length=0 {}
dl: "byte" is data_length=1 & inst_switch [addrmode =1; switchmode=inst_switch;]{}
dl: "word" is data_length=2 & inst_switch [addrmode =2; switchmode=inst_switch;]{}
dl: "dword" is data_length=3 & inst_switch [addrmode =3; switchmode=inst_switch;]{}
dl: "qword" is data_length>=4 & inst_switch [addrmode =4; switchmode=inst_switch;]{}

再回去定义oprand

oprand: rm is rm & switchmode = 0 { export  rm; }
oprand: imm is imm & switchmode = 5 { export imm; }

最后定义所有的算术指令

:add dl rn, oprand is op=2; dl & inst_switch; rn ; oprand { rn = rn + oprand;}
:dec dl rn, oprand is op=3; dl & inst_switch; rn ; oprand { rn = rn - oprand;}
:mul dl rn, oprand is op=4; dl & inst_switch; rn ; oprand { rn = rn * oprand;}
:div dl rn, oprand is op=5; dl & inst_switch; rn ; oprand { rn = rn / oprand;}
:mod dl rn, oprand is op=6; dl & inst_switch; rn ; oprand { rn = rn % oprand;}
:xor dl rn, oprand is op=7; dl & inst_switch; rn ; oprand { rn = rn ^ oprand;}
:or  dl rn, oprand is op=8; dl & inst_switch; rn ; oprand { rn = rn | oprand;}
:and dl rn, oprand is op=9; dl & inst_switch; rn ; oprand { rn = rn & oprand;}
:shl dl rn, oprand is op=0xa; dl & inst_switch; rn ; oprand { rn = rn << oprand;}
:shr dl rn, oprand is op=0xb; dl & inst_switch; rn ; oprand { rn = rn >> oprand;}

:not dl rn is op=0xc; dl & inst_switch=6; rn { rn = ~rn; }

这时候运行一下,可以看到又补充了部分的反编译结果,不过大部分因为Branch部分没有定义,所以被优化掉了

栈操作指令

直接操作栈的指令除了之前的mov以外,还有push和pop

因为和mov比较类似,只是多了对sp寄存器的操作,这里就直接给出定义

:pop dl rn is op=0xd; dl & inst_switch=6; rn { local stack_addr:8 = stack_(sp); rn = *stack_addr; sp = sp + 8;}
:push dl rn is op=0xe; dl & inst_switch=6; rn { sp = sp - 8; local stack_addr:8  = stack_(sp); *stack_addr = rn;}

定义Branch指令P-code

在程序中有call, ret, cmp, jmp, je, jne这几种Branch指令,这里我们为了简化,就不实现其他jxx的指令了

call/ret指令

call指令有两个形式

:call rn is op=0x10; inst_switch=6; rn {}
:call rel is op=0x10; inst_switch=7; rel {}

call rn比较容易实现,首先将下一个指令的地址放到栈上,然后call过去

:call rn is op=0x10; inst_switch=6; rn { sp = sp - 8; local stack_addr:8  = stack_(sp); *:8 stack_addr = inst_next; call [rn]; }

但是call rel指令就需要先定义rel这个symbol的语义,这里只需要将reloc export出去

rel: reloc is simm8 & addrmode=1 [reloc = inst_next + simm8;] {export *[ram]:8 reloc;}
rel: reloc is imm16 & addrmode=2 [reloc = inst_next + imm16;] {export *[ram]:8 reloc;}
rel: reloc is imm32 & addrmode=3 [reloc = inst_next + imm32;] {export *[ram]:8 reloc;}
rel: reloc is imm64 & addrmode=4 [reloc = inst_next + imm64;] {export *[ram]:8 reloc;}

然后其他东西就和call rn基本一样

:call rel is op=0x10; inst_switch=7; rel { sp = sp - 8; local stack_addr:8  = stack_(sp); *:8 stack_addr = inst_next; call [rel]; }

ret指令也很简单,从栈上获取pc地址,然后return回去

:ret is op=0x11; inst_switch {local stack_addr:8  = stack_(sp); pc = *stack_addr; sp = sp + 8 ;return [pc];}

cmp指令

本来cmp指令会改变很多flags,但是因为我们只用实现je和jne,因此我们这里偷懒,只实现ZF这个flag

首先在qwbvm.slaspec中添加一个新的寄存器ZF

define register offset=0x300 size=1 [ZF];

然后实现一下cmp指令

:cmp dl rn, rm is op=0x12; dl & inst_switch=0; rn; rm {ZF = rn == rm;}
:cmp dl rn, imm is op=0x12; dl & inst_switch=5; rn ; imm {ZF = rn == imm;}

jmp/je/jne指令

最后就到jmp, je和jne指令

:jmp addr is op=0x13; addr {}
:je  addr is op=0x14; addr {}
:jne addr is op=0x15; addr {}

这里addr有3种形式,分别export一下

addr: rn is inst_switch=6; rn {export rn;}
addr: rel is dl&inst_switch=7; rel {export rel;}
addr: "bss"[imm64] is inst_switch=8; imm64 { local bss_addr:8 = bss_(*[const]:8 imm64:8); export *bss_addr;}

最后实现一下jmp, je和jne

:jmp addr is op=0x13; addr {goto addr;}
:je  addr is op=0x14; addr {if(ZF==1) goto addr;}
:jne addr is op=0x15; addr {if(ZF==0) goto addr;}

现在运行一下,可以看到有循环结构,但是还是看不到if else的判断,因为halt指令还没实现,导致大部分P-code还是被优化掉了

定义halt指令P-code

这里我参考的是x86架构的实现,直接将halt指令定义为死循环

:halt is op=0; inst_switch & data_length {goto inst_start;}

现在运行一下,就是最后出来的效果,可以看到有明显的判断的操作

例如会判断前三个字符是不是QWQ,后面的字符是不是G00DR3VR等

当然,这个题目其实转化到汇编指令层面其实已经非常足够了,添加伪代码解析有种杀鸡用牛刀的感觉,但是从这个效果来说,非常令人意外

完整定义

define token opbyte(8)
    op  = (0, 5)
    rn = (0, 3)
    rm = (0, 3)
;
define token oplength(8)
    inst_switch = (0, 3)
    data_length = (4, 6)
;
define token data8(8)
    imm8 = (0, 7)
    simm8 = (0, 7) signed
;
define token data16(16)
    imm16 = (0, 15)
;
define token data32(32)
    imm32 = (0, 31)
;
define token data64(64)
    imm64_8 = (0, 7)
    imm64_16 = (0, 15)
    imm64_32 = (0, 31)
    imm64 = (0, 63)
;
define context contextreg
    addrmode = (0,2)
    switchmode = (3,6)
;

define pcodeop bss_;
define pcodeop stack_;
define pcodeop syscall;

attach variables [rn rm] [r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15];
dl: "" is data_length=0 {}
dl: "byte" is data_length=1 & inst_switch [addrmode =1; switchmode=inst_switch;]{}
dl: "word" is data_length=2 & inst_switch [addrmode =2; switchmode=inst_switch;]{}
dl: "dword" is data_length=3 & inst_switch [addrmode =3; switchmode=inst_switch;]{}
dl: "qword" is data_length>=4 & inst_switch [addrmode =4; switchmode=inst_switch;]{}

imm: imm8 is addrmode=1 ; imm8 {export *[const]:8 imm8;}
imm: imm16 is addrmode=2 ; imm16 {export *[const]:8 imm16;}
imm: imm32 is addrmode=3 ; imm32 {export *[const]:8 imm32;}
imm: imm64 is addrmode=4 ; imm64 {export *[const]:8 imm64;}

rel: reloc is simm8 & addrmode=1 [reloc = inst_next + simm8;] {export *[ram]:8 reloc;}
rel: reloc is imm16 & addrmode=2 [reloc = inst_next + imm16;] {export *[ram]:8 reloc;}
rel: reloc is imm32 & addrmode=3 [reloc = inst_next + imm32;] {export *[ram]:8 reloc;}
rel: reloc is imm64 & addrmode=4 [reloc = inst_next + imm64;] {export *[ram]:8 reloc;}

addr: rn is inst_switch=6; rn {export rn;}
addr: rel is dl&inst_switch=7; rel {export rel;}
addr: "bss"[imm64] is inst_switch=8; imm64 { local bss_addr:8 = bss_(*[const]:8 imm64:8); export *bss_addr;}

oprand: rm is rm & switchmode = 0 { export  rm; }
oprand: imm is imm & switchmode = 5 { export imm; }

:halt is op=0; inst_switch & data_length {goto inst_start;}

:mov dl rn, rm is op=1; dl & inst_switch=0 ; rn ; rm { rn = rm;}
:mov dl rn, "bss"[imm64] is op=1; dl & inst_switch=1 ; rn;  imm64 { local bss_addr:8 = bss_(*[const]:8 imm64:8); rn = *bss_addr; }
:mov dl "bss"[imm64], rn  is op=1; dl & inst_switch=2 ; imm64;  rn {local bss_addr:8 = bss_(*[const]:8 imm64:8); *bss_addr = rn;}
:mov dl rn, "stack"[imm64] is op=1; dl & inst_switch=3 ; rn;  imm64 { local stack_addr:8 = stack_(*[const]:8 imm64:8); rn = *stack_addr;}
:mov dl "stack"[imm64], rn  is op=1; dl & inst_switch=4 ; imm64;  rn {local stack_addr:8 = stack_(*[const]:8 imm64:8); *stack_addr = rn;}
:mov dl rn, imm is op=1; dl & inst_switch=5 ; rn;  imm { rn = imm;}
:mov dl "bss"[rn], rm is op=1; dl & inst_switch=0xb ; rn; rm {local bss_addr:8 = bss_(rn); *bss_addr = rm;}
:mov dl rn, "bss"[rm] is op=1; dl & inst_switch=0xc ; rn; rm {local bss_addr:8 = bss_(rm); rn = *bss_addr;}
:mov dl "stack"[rn], rm is op=1; dl & inst_switch=0xd ; rn; rm {local stack_addr:8 = stack_(rn); *stack_addr = rm;}
:mov dl rn, "stack"[rm] is op=1; dl & inst_switch=0xe ; rn; rm {local stack_addr:8 = stack_(rm); rn = *stack_addr;}

:add dl rn, oprand is op=2; dl & inst_switch; rn ; oprand { rn = rn + oprand;}
:dec dl rn, oprand is op=3; dl & inst_switch; rn ; oprand { rn = rn - oprand;}
:mul dl rn, oprand is op=4; dl & inst_switch; rn ; oprand { rn = rn * oprand;}
:div dl rn, oprand is op=5; dl & inst_switch; rn ; oprand { rn = rn / oprand;}
:mod dl rn, oprand is op=6; dl & inst_switch; rn ; oprand { rn = rn % oprand;}
:xor dl rn, oprand is op=7; dl & inst_switch; rn ; oprand { rn = rn ^ oprand;}
:or  dl rn, oprand is op=8; dl & inst_switch; rn ; oprand { rn = rn | oprand;}
:and dl rn, oprand is op=9; dl & inst_switch; rn ; oprand { rn = rn & oprand;}
:shl dl rn, oprand is op=0xa; dl & inst_switch; rn ; oprand { rn = rn << oprand;}
:shr dl rn, oprand is op=0xb; dl & inst_switch; rn ; oprand { rn = rn >> oprand;}

:not dl rn is op=0xc; dl & inst_switch=6; rn { rn = ~rn; }

:pop dl rn is op=0xd; dl & inst_switch=6; rn { local stack_addr:8 = stack_(sp); rn = *stack_addr; sp = sp + 8;}
:push dl rn is op=0xe; dl & inst_switch=6; rn { sp = sp - 8; local stack_addr:8  = stack_(sp); *stack_addr = rn;}

:call rn is op=0x10; inst_switch=6; rn { sp = sp - 8; local stack_addr:8  = stack_(sp); *:8 stack_addr = inst_next; call [rn]; }
:call rel is op=0x10; inst_switch=7; rel { sp = sp - 8; local stack_addr:8  = stack_(sp); *:8 stack_addr = inst_next; call [rel]; }

:ret is op=0x11; inst_switch {local stack_addr:8  = stack_(sp); pc = *stack_addr; sp = sp + 8 ;return [pc];}

:cmp dl rn, rm is op=0x12; dl & inst_switch=0; rn; rm {ZF = rn == rm;}
:cmp dl rn, imm is op=0x12; dl & inst_switch=5; rn ; imm {ZF = rn == imm;}

:jmp addr is op=0x13; addr {goto addr;}
:je  addr is op=0x14; addr {if(ZF==1) goto addr;}
:jne addr is op=0x15; addr {if(ZF==0) goto addr;}
:jle addr is op=0x16; addr {}
:jg  addr is op=0x17; addr {}
:jl  addr is op=0x18; addr {}
:jge addr is op=0x19; addr {}
:jbe addr is op=0x1a; addr {}
:ja  addr is op=0x1b; addr {}
:jnb addr is op=0x1c; addr {}
:jb  addr is op=0x1d; addr {}

:syscall is op=0x20; inst_switch { r0 = syscall(r0, r1, r2, r3); }
点击收藏 | 1 关注 | 1
  • 动动手指,沙发就是你的了!
登录 后跟帖