书接上回,看了Y4tacker师傅的文章才发现在编码处还有蹊跷。
编码特性
在unquote函数之前,tomcat还进行了一次parse操作:
跟进这个类:
查看parse实现,其中会有一个decodeText操作:
`public Map<String, String> parse(char[] charArray, int offset, int length, char separator) {` `if (charArray == null) {` `return new HashMap();` `} else {` `HashMap<String, String> params = new HashMap();` `this.chars = (char[])charArray.clone();` `this.pos = offset;` `this.len = length;`` ` `while(this.hasChar()) {` `String paramName = this.parseToken(new char[]{'=', separator});` `String paramValue = null;` `if (this.hasChar() && charArray[this.pos] == '=') {` `++this.pos;` `paramValue = this.parseQuotedToken(new char[]{separator});` `if (paramValue != null) {` `try {` `paramValue = RFC2231Utility.hasEncodedValue(paramName) ? RFC2231Utility.decodeText(paramValue) : MimeUtility.decodeText(paramValue);` `} catch (UnsupportedEncodingException var9) {` `}` `}` `}`` ` `if (this.hasChar() && charArray[this.pos] == separator) {` `++this.pos;` `}`` ` `if (paramName != null && !paramName.isEmpty()) {` `paramName = RFC2231Utility.stripDelimiter(paramName);` `if (this.lowerCaseNames) {` `paramName = paramName.toLowerCase(Locale.ENGLISH);` `}`` ` `params.put(paramName, paramValue);` `}` `}`` ` `return params;` `}``}`
这里有一个判断,判断的依据为传入的paramName是否以 " * "结尾(filename*):
根据判断结果不同,decodeText使用的类分为RFC2231Utility与MimeUtility:
在继续跟进之前插入一个前置知识关于QP编码:
Quoted-printable将任何8-bit字节值可编码为3个字符:一个等号”=”后跟随两个十六进制数字(0–9或A–F)表示该字节的数值。例如,ASCII码换页符(十进制值为12)可以表示为”=0C”, 等号”=”(十进制值为61)必须表示为”=3D”,gb2312下“中”表示为=D6=D0
https://y4tacker.github.io/2022/02/25/year/2022/2/Java文件上传大杀器-绕waf(针对commons-fileupload组件)/#成功的绕waf点
"jsp"三个字符的ASCII为\u006a\u0073\u0070,QP编码表示就应为 "=6a=73=70"。
RFC2231Utility:
查看RFC 2231文档我们可以看到其规则用例:
查看其逻辑:
以单引号" ' "为分隔符获取mimeCharset,encodedText,并且对后续字符串进行编码。
另外其中有一步fromHex操作:
具体逻辑如下:
`private static byte[] fromHex(String text) {` `int shift = true;` `ByteArrayOutputStream out = new ByteArrayOutputStream(text.length());` `int i = 0;`` ` `while(i < text.length()) {` `char c = text.charAt(i++);` `if (c == '%') {` `if (i > text.length() - 2) {` `break;` `}`` ` `byte b1 = HEX_DECODE[text.charAt(i++) & 127];` `byte b2 = HEX_DECODE[text.charAt(i++) & 127];` `out.write(b1 << 4 | b2);` `} else {` `out.write((byte)c);` `}` `}`` ` `return out.toByteArray();``}`
就是一个支持Hex解码的操作。
文档中有一些支持的格式:
但是tomcat的具体逻辑并未实现。
jdk8标准库中支持的编码如下:
`key = Big5``key = Big5-HKSCS``key = CESU-8``key = EUC-JP``key = EUC-KR``key = GB18030``key = GB2312``key = GBK``key = IBM-Thai``key = IBM00858``key = IBM01140``key = IBM01141``key = IBM01142``key = IBM01143``key = IBM01144``key = IBM01145``key = IBM01146``key = IBM01147``key = IBM01148``key = IBM01149``key = IBM037``key = IBM1026``key = IBM1047``key = IBM273``key = IBM277``key = IBM278``key = IBM280``key = IBM284``key = IBM285``key = IBM290``key = IBM297``key = IBM420``key = IBM424``key = IBM437``key = IBM500``key = IBM775``key = IBM850``key = IBM852``key = IBM855``key = IBM857``key = IBM860``key = IBM861``key = IBM862``key = IBM863``key = IBM864``key = IBM865``key = IBM866``key = IBM868``key = IBM869``key = IBM870``key = IBM871``key = IBM918``key = ISO-2022-CN``key = ISO-2022-JP``key = ISO-2022-JP-2``key = ISO-2022-KR``key = ISO-8859-1``key = ISO-8859-13``key = ISO-8859-15``key = ISO-8859-2``key = ISO-8859-3``key = ISO-8859-4``key = ISO-8859-5``key = ISO-8859-6``key = ISO-8859-7``key = ISO-8859-8``key = ISO-8859-9``key = JIS_X0201``key = JIS_X0212-1990``key = KOI8-R``key = KOI8-U``key = Shift_JIS``key = TIS-620``key = US-ASCII``key = UTF-16``key = UTF-16BE``key = UTF-16LE``key = UTF-32``key = UTF-32BE``key = UTF-32LE``key = UTF-8``key = windows-1250``key = windows-1251``key = windows-1252``key = windows-1253``key = windows-1254``key = windows-1255``key = windows-1256``key = windows-1257``key = windows-1258``key = windows-31j``key = x-Big5-HKSCS-2001``key = x-Big5-Solaris``key = x-COMPOUND_TEXT``key = x-euc-jp-linux``key = x-EUC-TW``key = x-eucJP-Open``key = x-IBM1006``key = x-IBM1025``key = x-IBM1046``key = x-IBM1097``key = x-IBM1098``key = x-IBM1112``key = x-IBM1122``key = x-IBM1123``key = x-IBM1124``key = x-IBM1166``key = x-IBM1364``key = x-IBM1381``key = x-IBM1383``key = x-IBM300``key = x-IBM33722``key = x-IBM737``key = x-IBM833``key = x-IBM834``key = x-IBM856``key = x-IBM874``key = x-IBM875``key = x-IBM921``key = x-IBM922``key = x-IBM930``key = x-IBM933``key = x-IBM935``key = x-IBM937``key = x-IBM939``key = x-IBM942``key = x-IBM942C``key = x-IBM943``key = x-IBM943C``key = x-IBM948``key = x-IBM949``key = x-IBM949C``key = x-IBM950``key = x-IBM964``key = x-IBM970``key = x-ISCII91``key = x-ISO-2022-CN-CNS``key = x-ISO-2022-CN-GB``key = x-iso-8859-11``key = x-JIS0208``key = x-JISAutoDetect``key = x-Johab``key = x-MacArabic``key = x-MacCentralEurope``key = x-MacCroatian``key = x-MacCyrillic``key = x-MacDingbat``key = x-MacGreek``key = x-MacHebrew``key = x-MacIceland``key = x-MacRoman``key = x-MacRomania``key = x-MacSymbol``key = x-MacThai``key = x-MacTurkish``key = x-MacUkraine``key = x-MS932_0213``key = x-MS950-HKSCS``key = x-MS950-HKSCS-XP``key = x-mswin-936``key = x-PCK``key = x-SJIS_0213``key = x-UTF-16LE-BOM``key = X-UTF-32BE-BOM``key = X-UTF-32LE-BOM``key = x-windows-50220``key = x-windows-50221``key = x-windows-874``key = x-windows-949``key = x-windows-950``key = x-windows-iso2022jp`
其中一些编码会出现隐藏字符的问题:
MimeUtility:
查看RFC 2047文档我们可以看到其规则用例:
用我CET-6的英文水平来尝试理解第二位中的encode位,' Q '用于解析ascii字符(实际上为QP编码),而' B '用于解析base64格式的字符:
Q:
B:
文档中的Section 5还有一些比较有意思的格式:
关键逻辑如下:
`public static String decodeText(String text) throws UnsupportedEncodingException {` `if (!text.contains("=?")) {` `return text;` `} else {` `int offset = 0;` `int endOffset = text.length();` `int startWhiteSpace = -1;` `int endWhiteSpace = -1;` `StringBuilder decodedText = new StringBuilder(text.length());` `boolean previousTokenEncoded = false;`` ` `while(true) {` `while(true) {` `while(offset < endOffset) {` `char ch = text.charAt(offset);` `if (" \t\r\n".indexOf(ch) != -1) {` `for(startWhiteSpace = offset; offset < endOffset; ++offset) {` `ch = text.charAt(offset);` `if (" \t\r\n".indexOf(ch) == -1) {` `endWhiteSpace = offset;` `break;` `}` `}` `} else {` `int wordStart;` `for(wordStart = offset; offset < endOffset; ++offset) {` `ch = text.charAt(offset);` `if (" \t\r\n".indexOf(ch) != -1) {` `break;` `}` `}`` ` `String word = text.substring(wordStart, offset);` `if (word.startsWith("=?")) {` `try {` `String decodedWord = decodeWord(word);` `if (!previousTokenEncoded && startWhiteSpace != -1) {` `decodedText.append(text, startWhiteSpace, endWhiteSpace);` `startWhiteSpace = -1;` `}`` ` `previousTokenEncoded = true;` `decodedText.append(decodedWord);` `continue;` `} catch (ParseException var11) {` `}` `}`` ` `if (startWhiteSpace != -1) {` `decodedText.append(text, startWhiteSpace, endWhiteSpace);` `startWhiteSpace = -1;` `}`` ` `previousTokenEncoded = false;` `decodedText.append(word);` `}` `}`` ` `return decodedText.toString();` `}` `}` `}``}`
payload格式配合上一篇的unquote函数感觉可以玩出花来...这里暂时先不做深究。
支持编码:
总结
大概画了张图:
spring中的处理逻辑略有不同,详见Y4tacker师傅的文章:
https://y4tacker.github.io/2022/06/19/year/2022/6/探寻Tomcat文件上传流量层面绕waf新姿势/#深入
https://y4tacker.github.io/2022/02/25/year/2022/2/Java文件上传大杀器-绕waf(针对commons-fileupload组件)/#成功的绕waf点