前言
Ghidra提供了创建Processor的功能,这篇文章以强网杯2020年初赛的QWBLogin为例,介绍如何创建一个Processor,将二进制代码翻译为汇编
准备
安装Eclipse
为了创建一个Processor,首先需要安装Eclipse,这里安装的过程就不多说了
安装GhidraDev插件
插件的路径在Ghidra的安装目录下 Extensions/Eclipse/GhidraDev/GhidraDev-2.1.1.zip
具体安装的步骤
- Click Help → Install New Software...
- Click Add...
- Click Archive...
- Select GhidraDev zip file from <ghidrainstalldir>/Extensions/Eclipse/GhidraDev/</ghidrainstalldir>
- Click OK (name field can be blank)
- Check Ghidra category (or GhidraDev entry)
- Click Next
- Click Next
- Accept the terms of the license agreement
- Click Finish
- Click Install anyway
- Click Restart Now
创建项目
选择File->New->Project,然后再选择Ghidra Module Project
起一个项目名字
下一个,只选择Processor
然后选择Ghidra安装的目录
data/languages目录下有Processor相关的示例文件
这里建议将文件名修改一下,将skel修改为qwbvm(File->Rename)
Processor定义
因为这是一个Ghidra教程而不是QWBLogin这道题的writeup,因此这里跳过逆向过程,直接给出Processor各种指令的定义
指令结构
以下是指令的格式,x1和x2不一定有,而且长度也不定
+-----------------------+-------------+-----------+-----------+-----------+
|opcode | inst_switch | length | x1 | x2 |
+-----------------------+-------------+-----------+-----------+-----------+
+-----------------------+-------------+-----------+-----------+-----------+
|1byte | 4bit | 4bit | ? | ? |
+-----------------------+-------------+-----------+-----------+-----------+
指令表
指令 | opcode | inst_switch | length | x1 | x2 |
---|---|---|---|---|---|
halt | 0 | 0 | |||
mov x1, x2 | 1 | 0 | [1-4] | reg | reg |
mov x1, bss[x2] | 1 | 1 | [1-4] | reg | imm64 |
mov bss[x1], x2 | 1 | 2 | [1-4] | imm64 | reg |
mov x1, stack[x2] | 1 | 3 | [1-4] | reg | imm64 |
mov stack[x1], x2 | 1 | 4 | [1-4] | imm64 | reg |
mov x1, x2 | 1 | 5 | [1-4] | reg | imm |
mov bss[x1],x2 | 1 | 0xb | [1-4] | reg | reg |
mov x1, bss[x2] | 1 | 0xc | [1-4] | reg | reg |
mov stack[x1],x2 | 1 | 0xd | [1-4] | reg | reg |
mov x1, stack[x2] | 1 | 0xe | [1-4] | reg | reg |
add x1, x2 | 2 | 0 | [1-4] | reg | reg |
add x1, x2 | 2 | 5 | [1-4] | reg | imm |
dec x1, x2 | 3 | 0 | [1-4] | reg | reg |
dec x1, x2 | 3 | 5 | [1-4] | reg | imm |
mul x1, x2 | 4 | 0 | [1-4] | reg | reg |
mul x1, x2 | 4 | 5 | [1-4] | reg | imm |
div x1, x2 | 5 | 0 | [1-4] | reg | reg |
div x1, x2 | 5 | 5 | [1-4] | reg | imm |
mod x1, x2 | 6 | 0 | [1-4] | reg | reg |
mod x1, x2 | 6 | 5 | [1-4] | reg | imm |
xor x1, x2 | 7 | 0 | [1-4] | reg | reg |
xor x1, x2 | 7 | 5 | [1-4] | reg | imm |
or x1, x2 | 8 | 0 | [1-4] | reg | reg |
or x1, x2 | 8 | 5 | [1-4] | reg | imm |
and x1, x2 | 9 | 0 | [1-4] | reg | reg |
and x1, x2 | 9 | 5 | [1-4] | reg | imm |
shl x1, x2 | 10 | 0 | [1-4] | reg | reg |
shl x1, x2 | 10 | 5 | [1-4] | reg | imm |
shr x1, x2 | 11 | 0 | [1-4] | reg | reg |
shr x1, x2 | 11 | 5 | [1-4] | reg | imm |
not x1 | 12 | 6 | [1-4] | reg | |
pop x1 | 13 | 6 | [1-4] | reg | |
push x1 | 14 | 6 | [1-4] | reg | |
call x1 | 16 | 6 | reg | ||
call x1 | 16 | 7 | reladdr | ||
ret | 17 | ||||
cmp x1, x2 | 18 | 0 | [1-4] | reg | reg |
cmp x1, x2 | 18 | 5 | [1-4] | reg | imm |
jmp x1 | 19 | 6 | reg | ||
jmp x1 | 19 | 7 | reladdr | ||
jmp bss[x1] | 19 | 8 | imm64 | ||
syscall | 32 |
表中省略了一部分指令,为je/jne/jle/jg/jl/jge/jbe/ja/jnb/jb,和jmp基本一样,除了opcode,opcode的值从20到29
寄存器
寄存器 | 定义 |
---|---|
r0-r15 | 普通寄存器 |
sp | 栈寄存器 |
pc | 程序计数寄存器 |
创建Processor
可以看到目录下面有7个文件,每个文件的作用都不太一样
- qwbvm.cspec 编译器定义,例如调用约定的设置,栈寄存器是哪个
- qwbvm.ldefs 语言的定义,例如大小端,位宽
- qwbvm.opinion 定义可以使用的加载器,例如ELF、PE加载器
- qwbvm.pspec 处理器定义,定义寄存器,各种变量
- qwbvm.sinc, qwbvm.slaspec 定义寄存器,指令等,大部分时间都花在这两个文件上
接下来一个个修改文件吧
qwbvm.pspec
首先来定义寄存器,定义了pc和r0-r15寄存器
<?xml version="1.0" encoding="UTF-8"?>
<!-- See Relax specification: Ghidra/Framework/SoftwareModeling/data/languages/processor_spec.rxg -->
<processor_spec>
<programcounter register="pc"/>
<register_data>
<register name="r0" group="Alt"/>
<register name="r1" group="Alt"/>
<register name="r2" group="Alt"/>
<register name="r3" group="Alt"/>
<register name="r4" group="Alt"/>
<register name="r5" group="Alt"/>
<register name="r6" group="Alt"/>
<register name="r7" group="Alt"/>
<register name="r8" group="Alt"/>
<register name="r9" group="Alt"/>
<register name="r10" group="Alt"/>
<register name="r11" group="Alt"/>
<register name="r12" group="Alt"/>
<register name="r13" group="Alt"/>
<register name="r14" group="Alt"/>
<register name="r15" group="Alt"/>
</register_data>
</processor_spec>
qwbvm.cspec
这里定义调用约定,函数的前3个参数通过r0, r1, r2传递,返回值通过r0传递
<?xml version="1.0" encoding="UTF-8"?>
<!-- See Relax specification: Ghidra/Framework/SoftwareModeling/data/languages/compiler_spec.rxg -->
<compiler_spec>
<data_organization>
<pointer_size value="2" />
</data_organization>
<global>
<range space="ram"/>
</global>
<stackpointer register="SP" space="ram"/>
<default_proto>
<prototype name="__asmA" extrapop="2" stackshift="2" strategy="register">
<input>
<pentry minsize="1" maxsize="8">
<register name="r0"/>
</pentry>
<pentry minsize="1" maxsize="8">
<register name="r1"/>
</pentry>
<pentry minsize="1" maxsize="8">
<register name="r2"/>
</pentry>
</input>
<output>
<pentry minsize="1" maxsize="1">
<register name="r0"/>
</pentry>
</output>
</prototype>
</default_proto>
</compiler_spec>
qwbvm.ldefs
修改processor的名字,位宽为64位,sla文件为qwbvm.sla,processor文件为qwbvm.pspec,compiler文件为qwbvm.cspec
<?xml version="1.0" encoding="UTF-8"?>
<!-- See Relax specification: Ghidra/Framework/SoftwareModeling/data/languages/language_definitions.rxg -->
<language_definitions>
<!-- Uncomment the following to make the language available in Ghidra -->
<language processor="qwbvm"
endian="little"
size="64"
variant="default"
version="1.0"
slafile="qwbvm.sla"
processorspec="qwbvm.pspec"
id="qwbvm:LE:64:default">
<description>QWB VM Language Module</description>
<compiler name="default" spec="qwbvm.cspec" id="default"/>
</language>
</language_definitions>
qwbvm.opinion
这个是加载器的定义文件,但是因为我们加载的是纯二进制文件,这个文件不用改
qwbvm.slaspec
首先定义几个内存空间, 分别为ram,bss,register,size都为8
define space ram type=ram_space size=8 default;
define space bss type=ram_space size=8;
define space register type=register_space size=8;
然后定义普通寄存器和特殊寄存器,contextreg是上下文寄存器,后面定义指令的时候会用到,辅助解析指令
define register offset=0x00 size=8 [r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15];
define register offset=0x100 size=8 [sp pc];
define register offset=0x200 size=8 contextreg;
最后把qwbvm.sinc include进来
@include "qwbvm.sinc"
完整的内容如下
define endian=little;
define alignment=1;
define space ram type=ram_space size=2 default;
define space bss type=ram_space size=2;
define space register type=register_space size=8;
define register offset=0x00 size=8 [r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15];
define register offset=0x100 size=8 [sp pc];
define register offset=0x200 size=8 contextreg;
# Include contents of qwbvm.sinc file
@include "qwbvm.sinc"
qwbvm.sinc
前面定义完寄存器相关的内容,这里主要定义指令的格式
首先介绍一下token
token
token是组成instruction的元素,也就是说,首先需要将二进制指令解析为一个个token,然后由token组成instruction
token定义的格式如下,
tokenname是token的名字,括号里面的integer需要为8的整数,代表这个token有多少个bit
define token tokenname ( integer )
fieldname=(integer,integer) attributelist
...
;
这里是一个示例,定义了一个token,名字为opbyte,大小为8个bit,前6个bit为op,同时前4个bit也可以为rn,rm
define token opbyte(8)
op = (0, 5)
rn = (0, 3)
rm = (0, 3)
;
这里我们给出所有的token的定义
define token opbyte(8)
op = (0, 5)
rn = (0, 3)
rm = (0, 3)
;
define token oplength(8)
inst_switch = (0, 3)
data_length = (4, 6)
;
define token data8(8)
imm8 = (0, 7)
simm8 = (0, 7) signed
;
define token data16(16)
imm16 = (0, 15)
;
define token data32(32)
imm32 = (0, 31)
;
define token data64(64)
imm64_8 = (0, 7)
imm64_16 = (0, 15)
imm64_32 = (0, 31)
imm64 = (0, 63)
;
因为rn和rm是代表寄存器,因此我们把rn,rm映射为r0到r15
attach variables [rn rm] [r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15];
然后来定义我们第一个指令,halt,halt由两个token组成,opbyte和oplength,op需要等于0,然后后面紧接着的是 inst_switch & data_length
:halt is op=0; inst_switch & data_length {}
多个token拼接需要用";"隔开,然后同一个token中的不同部分需要用“&”来组合,这个&同时可以理解为逻辑与,因为也可以用"|"来组合,可以理解为逻辑或
后面{} 是用来放PCode的,但是因为我们只是从二进制解析为指令,因此不需要Pcode
然后我们来翻译mov x1, x2这个指令,其中x1, x2都是reg,然后有1到4,四种长度
:mov "byte" rn, rm is op=1; inst_switch = 0 & data_length = 1 ; rn ; rm {}
:mov "word" rn, rm is op=1; inst_switch = 0 & data_length = 2 ; rn ; rm {}
:mov "dword" rn, rm is op=1; inst_switch = 0 & data_length = 3 ; rn ; rm {}
:mov "qword" rn, rm is op=1; inst_switch = 0 & data_length = 4 ; rn ; rm {}
其中"byte", "word"等在双引号里面的字符是代表纯字符串,没有其他含义,同样是字符的还有mov,而其他rn,rm等是需要在token中声明的
现在可以点击菜单里面的Run,运行测试一下,将题目附件中的test.bin文件拖入ghidra,会弹框让你选择processor,搜索qwbvm,然后确定
打开刚刚拖入的test.bin二进制文件,可以将05开始的代码反汇编为halt
还可以将0x1a8处的二进制反汇编为mov word r8,r1
我们再继续定义几个指令
:mov "byte" rn, "bss"[imm64] is op=1; inst_switch = 1 & data_length = 1 ; rn ; imm64 {}
:mov "word" rn, "bss"[imm64] is op=1; inst_switch = 1 & data_length = 2 ; rn ; imm64 {}
:mov "dword" rn, "bss"[imm64] is op=1; inst_switch = 1 & data_length = 3 ; rn ; imm64 {}
:mov "qword" rn, "bss"[imm64] is op=1; inst_switch = 1 & data_length = 4 ; rn ; imm64 {}
:mov "byte" "bss"[imm64], rn is op=1; inst_switch = 2 & data_length = 1 ; imm64; rn {}
:mov "word" "bss"[imm64], rn is op=1; inst_switch = 2 & data_length = 2 ; imm64 ; rn {}
:mov "dword" "bss"[imm64], rn is op=1; inst_switch = 2 & data_length = 3 ; imm64 ; rn {}
:mov "qword" "bss"[imm64], rn is op=1; inst_switch = 2 & data_length = 4 ; imm64 ; rn {}
这个时候我们发现每个指令都要根据不同的数据长度类型定义几个类似的指令,很麻烦,那么下面就介绍如何将其简化
观察各个指令,发现相同之处在于
"byte" xxxxx data_length = 1
"word" xxxxx data_length = 2
"dword" xxxxx data_length = 3
"qword" xxxxx data_length = 4
那么我们可以定义一个symbol,dl
dl: "" is data_length = 0 {}
dl: "byte" is data_length = 2 {}
dl: "dword" is data_length = 3 {}
dl: "qword" is data_length >= 4 {}
然后原来的指令就可以简化为
:mov dl rn, rm is op=1; inst_switch = 0 & dl ; rn ; rm {}
:mov dl rn, "bss"[imm64] is op=1; inst_switch = 1 & dl ; rn ; imm64 {}
:mov dl "bss"[imm64], rn is op=1; inst_switch = 2 & dl ; imm64; rn {}
再次运行,打开test.bin,反汇编0x1f7处的代码, 可以看到能成功反汇编出来
我们继续完善
:mov dl rn, "stack"[imm64] is op=1; dl & inst_switch=3 ; rn; imm64 {}
:mov dl "stack"[imm64], rn is op=1; dl & inst_switch=4 ; imm64; rn {}
:mov rn, imm8 is op=1; data_length = 1 & inst_switch = 5; rn; imm8 {}
:mov rn, imm16 is op=1; data_length = 2 & inst_switch = 5; rn; imm16 {}
:mov rn, imm32 is op=1; data_length = 3 & inst_switch = 5; rn; imm32 {}
:mov rn, imm64 is op=1; data_length = 4 & inst_switch = 5; rn; imm64 {}
但是发现mov rn, imm这个指令写起来比较麻烦,因为这个指令imm的长度依赖的是data_length,那能不能将其再简化一下呢?
这个时候我们引入另外一个东西,context
这里我们定义了一个context,contextreg,其中addrmode占3个bit
这个context的存在是因为有时候,处理器会根据不同的状态而解析出不一样的指令,例如arm中的thumb mode
define context contextreg
addrmode = (0,2)
;
定义了context之后,我们需要在适当的时候给它赋值,这里我们选择解析symbol dl的时候顺便给context赋值
dl: "" is data_length = 0 [addrmode = 1;]{}
dl: "byte" is data_length = 2 [addrmode = 2;]{}
dl: "dword" is data_length = 3 [addrmode = 3;]{}
dl: "qword" is data_length >= 4 [addrmode = 4;]{}
然后我们定义imm这个symbol
imm: imm8 is addrmode = 1; imm8 {}
imm: imm16 is addrmode = 2; imm16 {}
imm: imm32 is addrmode = 3; imm32 {}
imm: imm64 is addrmode = 4; imm64 {}
然后上面的指令就可以简化为
:mov dl rn, imm is op=1; dl & inst_switch = 5; rn; imm {}
后面还需要介绍的还有另外一个指令,call xxx,call指令是相对地址函数调用的,因此要获取当前指令的地址
我们可以定义rel这个symbol,然后里面的reloc是通过inst_next和imm计算出来的,而inst_next是ghidra自带的一个symbol,代表的就是下一个指令的地址
rel: reloc is simm8 & addrmode=1 [reloc = inst_next + simm8;] {}
rel: reloc is imm16 & addrmode=2 [reloc = inst_next + imm16;] {}
rel: reloc is imm32 & addrmode=3 [reloc = inst_next + imm32;] {}
rel: reloc is imm64 & addrmode=4 [reloc = inst_next + imm64;] {}
:call rel is op=0x10; inst_switch=7; rel {}
其他东西基本上都是大同小异,最后完整的processor定义如下
define token opbyte(8)
op = (0, 5)
rn = (0, 3)
rm = (0, 3)
;
define token oplength(8)
inst_switch = (0, 3)
data_length = (4, 6)
;
define token data8(8)
imm8 = (0, 7)
simm8 = (0, 7) signed
;
define token data16(16)
imm16 = (0, 15)
;
define token data32(32)
imm32 = (0, 31)
;
define token data64(64)
imm64_8 = (0, 7)
imm64_16 = (0, 15)
imm64_32 = (0, 31)
imm64 = (0, 63)
;
define context contextreg
addrmode = (0,2)
;
attach variables [rn rm] [r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15];
dl: "" is data_length=0 {}
dl: "byte" is data_length=1 [addrmode =1;]{}
dl: "word" is data_length=2 [addrmode =2;]{}
dl: "dword" is data_length=3 [addrmode =3;]{}
dl: "qword" is data_length>=4 [addrmode =4;]{}
imm: imm8 is addrmode=1 ; imm8 {}
imm: imm16 is addrmode=2 ; imm16 {}
imm: imm32 is addrmode=3 ; imm32 {}
imm: imm64 is addrmode=4 ; imm64 {}
rel: reloc is simm8 & addrmode=1 [reloc = inst_next + simm8;] {}
rel: reloc is imm16 & addrmode=2 [reloc = inst_next + imm16;] {}
rel: reloc is imm32 & addrmode=3 [reloc = inst_next + imm32;] {}
rel: reloc is imm64 & addrmode=4 [reloc = inst_next + imm64;] {}
addr: rn is inst_switch=6; rn {}
addr: rel is dl&inst_switch=7; rel {}
addr: "bss"[imm64] is inst_switch=8; imm64 {}
oprand: dl rn, rm is dl & inst_switch=0; rn; rm {}
oprand: dl rn, imm is dl & inst_switch=5; rn; imm {}
:halt is op=0; inst_switch & data_length {}
:mov dl rn, rm is op=1; dl & inst_switch=0 ; rn ; rm {}
:mov dl rn, "bss"[imm64] is op=1; dl & inst_switch=1 ; rn; imm64 {}
:mov dl "bss"[imm64], rn is op=1; dl & inst_switch=2 ; imm64; rn {}
:mov dl rn, "stack"[imm64] is op=1; dl & inst_switch=3 ; rn; imm64 {}
:mov dl "stack"[imm64], rn is op=1; dl & inst_switch=4 ; imm64; rn {}
:mov dl rn, imm is op=1; dl & inst_switch=5 ; rn; imm {}
:mov dl "bss"[rn], rm is op=1; dl & inst_switch=0xb ; rn; rm {}
:mov dl rn, "bss"[rm] is op=1; dl & inst_switch=0xc ; rn; rm {}
:mov dl "stack"[rn], rm is op=1; dl & inst_switch=0xd ; rn; rm {}
:mov dl rn, "stack"[rm] is op=1; dl & inst_switch=0xe ; rn; rm {}
:add oprand is op=2; oprand {}
:dec oprand is op=3; oprand {}
:mul oprand is op=4; oprand {}
:div oprand is op=5; oprand {}
:mod oprand is op=6; oprand {}
:xor oprand is op=7; oprand {}
:or oprand is op=8; oprand {}
:and oprand is op=9; oprand {}
:shl oprand is op=0xa; oprand {}
:shr oprand is op=0xb; oprand {}
:not dl rn is op=0xc; dl & inst_switch=6; rn {}
:pop dl rn is op=0xd; dl & inst_switch=6; rn {}
:push dl rn is op=0xe; dl & inst_switch=6; rn {}
:call rn is op=0x10; inst_switch=6; rn {}
:call rel is op=0x10; inst_switch=7; rel {}
:ret is op=0x11; inst_switch {}
:cmp dl rn, rm is op=0x12; dl & inst_switch=0; rn; rm {}
:cmp dl rn, imm is op=0x12; dl & inst_switch=5; rn ; imm {}
:jmp addr is op=0x13; addr {}
:je addr is op=0x14; addr {}
:jne addr is op=0x15; addr {}
:jle addr is op=0x16; addr {}
:jg addr is op=0x17; addr {}
:jl addr is op=0x18; addr {}
:jge addr is op=0x19; addr {}
:jbe addr is op=0x1a; addr {}
:ja addr is op=0x1b; addr {}
:jnb addr is op=0x1c; addr {}
:jb addr is op=0x1d; addr {}
:syscall is op=0x20; inst_switch {}
最后可以在0x100处反汇编出完整的指令