前几天看了浏览器解码看XSS,没有看得很明白,又找了这篇深入理解浏览器解析机制和XSS向量编码,翻译的文章,有些地方翻译的怪怪的,需要看下原文,啃了2天终于搞明白了
原文里给出了几个XSS Payload,也给出了答案和演示地址,有答案但没解析,下面一个个分析
有点像上学的时候,看书看不懂,做题不会做,看答案解析做题就懂了
Basics
1
<a href="%6a%61%76%61%73%63%72%69%70%74:%61%6c%65%72%74%28%31%29"></a>
URL encoded "javascript:alert(1)"
Answer: The javascript will NOT execute.
里面没有HTML编码内容,不考虑,其中href内部是URL,于是直接丢给URL模块处理,但是协议无法识别(即被编码的javascript:),解码失败,不会被执行
URL规定协议,用户名,密码都必须是ASCII,编码当然就无效了
A URL’s scheme is an ASCII string that identifies the type of URL and can be used to dispatch a URL for further processing after parsing. It is initially the empty string.
A URL’s username is an ASCII string identifying a username. It is initially the empty string.
A URL’s password is an ASCII string identifying a password. It is initially the empty string.
2
<a href="javascript:%61%6c%65%72%74%28%32%29">
Character entity encoded "javascript" and URL encoded "alert(2)"
Answer: The javascript will execute.
先HTML解码,得到
<a href="javascript:%61%6c%65%72%74%28%32%29">
href中为URL,URL模块可识别为javascript协议,进行URL解码,得到
<a href="javascript:alert(2)">
由于是javascript协议,解码完给JS模块处理,于是被执行
3
<a href="javascript%3aalert(3)"></a>
URL encoded ":"
Answer: The javascript will NOT execute.
同1,不解释
4
<div><img src=x onerror=alert(4)></div>
Character entity encoded < and >
Answer: The javascript will NOT execute.
这里包含了HTML编码内容,反过来以开发者的角度思考,HTML编码就是为了显示这些特殊字符,而不干扰正常的DOM解析,所以这里面的内容不会变成一个img元素,也不会被执行
从HTML解析机制看,在读取<div>之后进入数据状态,<会被HTML解码,但不会进入标签开始状态,当然也就不会创建img元素,也就不会执行
5
<textarea><script>alert(5)</script></textarea>
Character entity encoded < and >
Answer: The javascript will NOT execute AND the character entities will NOT
be decoded either
<textarea>是RCDATA元素(RCDATA elements),可以容纳文本和字符引用,注意不能容纳其他元素,HTML解码得到
<textarea><script>alert(5)</script></textarea>
于是直接显示
RCDATA元素(RCDATA elements)包括textarea和title
6
<textarea><script>alert(6)</script></textarea>
Answer: The javascript will NOT execute.
同5,不解释
Advanced
7
<button onclick="confirm('7');">Button</button>
Character entity encoded '
Answer: The javascript will execute.
这里onclick中为标签的属性值(类比2中的href),会被HTML解码,得到
<button onclick="confirm('7');">Button</button>
然后被执行
8
<button onclick="confirm('8\u0027);">Button</button>
Unicode escape sequence encoded '
Answer: The javascript will NOT execute.
onclick中的值会交给JS处理,在JS中只有字符串和标识符能用Unicode表示,'显然不行,JS执行失败
In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value.
from https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-source-code (这个链接很卡)
标识符(identifiers)
代码中用来标识变量、函数、或属性的字符序列。
在JavaScript中,标识符只能包含字母或数字或下划线(“_”)或美元符号(“$”),且不能以数字开头。标识符与字符串不同之处在于字符串是数据,而标识符是代码的一部分。在 JavaScript 中,无法将标识符转换为字符串,但有时可以将字符串解析为标识符。from https://developer.mozilla.org/zh-CN/docs/Glossary/Identifier
9
<script>alert(9);</script>
Character entity encoded alert(9);
Answer: The javascript will NOT execute.
script属于原始文本元素(Raw text elements),只可以容纳文本,注意没有字符引用,于是直接由JS处理,JS也认不出来,执行失败
原始文本元素(Raw text elements)有<script>和<style>
10
<script>\u0061\u006c\u0065\u0072\u0074(10);</script>
Unicode Escape sequence encoded alert
Answer: The javascript will execute.
同8,函数名alert属于标识符,直接被JS执行
11
<script>\u0061\u006c\u0065\u0072\u0074\u0028\u0031\u0031\u0029</script>
Unicode Escape sequence encoded alert(11)
Answer: The javascript will NOT execute.
同8,不解释
12
<script>\u0061\u006c\u0065\u0072\u0074(\u0031\u0032)</script>
Unicode Escape sequence encoded alert and 12
Answer: The javascript will NOT execute.
这里看似将没毛病,但是这里\u0031\u0032在解码的时候会被解码为字符串12,注意是字符串,不是数字,文字显然是需要引号的,JS执行失败
13
<script>alert('13\u0027)</script>
Unicode escape sequence encoded '
Answer: The javascript will NOT execute.
同8
14
<script>alert('14\u000a')</script>
Unicode escape sequence encoded line feed.
Answer: The javascript will execute.
\u000a在JavaScript里是换行,就是\n,直接执行
Java菜鸡才知道在Java里\u000a是换行,相当于在源码里直接按一下回车键,后面的代码都换行了
ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence \u000A, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED (LF)) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence \u000A occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write \n instead of \u000A to cause a LINE FEED (LF) to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.
from https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-source-code
Bonus
15
<a href="javascript:%5c%75%30%30%36%31%5c%75%30%30%36%63%5c%75%30%30%36%35%5c%75%30%30%37%32%5c%75%30%30%37%34(15)"></a>
Answer: The javascript will execute.
先HTML解码,得到
<a href="javascript:%5c%75%30%30%36%31%5c%75%30%30%36%63%5c%75%30%30%36%35%5c%75%30%30%37%32%5c%75%30%30%37%34(15)"></a>
在href中由URL模块处理,解码得到
javascript:\u0061\u006c\u0065\u0072\u0074(15)
识别JS协议,然后由JS模块处理,解码得到
javascript:alert(15)
最后被执行
总结
-
<script>和<style>数据只能有文本,不会有HTML解码和URL解码操作 -
<textarea>和<title>里会有HTML解码操作,但不会有子元素 -
其他元素数据(如
div)和元素属性数据(如href)中会有HTML解码操作 -
部分属性(如
href)会有URL解码操作,但URL中的协议需为ASCII -
JavaScript会对字符串和标识符Unicode解码
根据浏览器的自动解码,反向构造 XSS Payload 即可
转载
分享
没有评论