前言

lab-1 是控制流分析入门。

lab-1-sample

实验纲要:通过所给php源码文件,利用PHPJoern进行程序分析,从文件的第一行或函数的开头开始分析,
给出到达函数 $DB->request($query)(lineno:72) 的控制流路径

Sample

1
2
3
4
5
6
7
8
9
<?php
$a=1;
if (isset($_GET['only_tasks'])) {
$only_tasks = explode(',', $_GET['only_tasks']);
} else {
$only_tasks = [];
}
$crontask = new Crontask();
?>

Sample Input:

script
1
(2,8)

Sample Output:

script
1
2
[2,3,4,8]
[2,3,6,8]

note

  • 到达该函数的控制流语句应当有多条。

为sample case生成CPG图,并导入neo4j数据库分析结点信息。

在CPG图中存在一个起始结点,该结点通常包含一些其他结点不存在的属性,如同时存在linenoendlineno,包含文件名,node的类型为AST_TOPLEVEL

该起始结点如下图所示,属性为:

  • lineno : 1
  • endlineno : 9
  • flags : TOPLEVEL_FILE
  • name : lab1_sample.php
  • id : 1
  • type : AST_TOPLEVEL

1

控制流路径如下图绿色部分实线所示:

2

当然我们也可以直接通过查询语句查看CFG,就可以很清晰地看到存在两条控制流路径:

1
MATCH p=()-[r:FLOWS_TO]->() RETURN p LIMIT 25

3

Python Script for Sample :

config.py:

1
2
3
4
5
6
7
8
# config.py
import py2neo

neo4j_host = "10.176.36.21"
neo4j_username = "neo4j"
neo4j_password = "123"
port = '36888'
graph = py2neo.Graph(host=neo4j_host, port=port, auth=(neo4j_username, neo4j_password))

main.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# main.py
from config import graph
import py2neo


def get_CFG_FUNC_ENTRY_relationships(Node):
"""
寻找CFG的ENRTY结点,先得到relationship
:param Node:
:return:
"""
return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='ENTRY'))

def get_CFG_FUNC_ENTRY_node(Node):
"""
寻找CFG的入口结点,只有一个
:param Node:
:return:
"""
entry_relationship = get_CFG_FUNC_ENTRY_relationships(Node)
for rel in entry_relationship:
if rel.start_node == Node:
return rel.end_node
return None


def get_FLOWS_TO_relationships(Node):
"""
提供一个Node结点, 返回和该node相关的 FLOWS_TO Relationship list
:param Node:
:return:
"""
return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='FLOWS_TO'))

def get_FLOWS_TO_nodes(Node):
"""
提供一个Node结点,寻找其CFG图中的子结点
:param Node:
:return:
"""
cfg_child_nodes = []
flows_to_rels = get_FLOWS_TO_relationships(Node)
for node_item in flows_to_rels:
if node_item.start_node == Node:
if node_item not in cfg_child_nodes:
cfg_child_nodes.append(node_item.end_node)
cfg_child_nodes.sort(key=lambda node: node['id'], reverse=False)
return cfg_child_nodes


def get_CFG_paths(Node, path):
"""
通过深搜来遍历CFG路径
:param Node: CFG ENTRY Node
:param path: 控制流路径
:param path_num: 控制流条数
:return:
"""
if Node['type'] == 'NULL':
print("\t", path[:-1])

return
# path = []
child_nodes = get_FLOWS_TO_nodes(Node)
for child in child_nodes:
path.append(child['lineno'])
get_CFG_paths(child, path)
path.pop()


if __name__ == '__main__':
top_node = graph.nodes[1]
print("[*] TOPLEVEL Node = ", top_node)

cfg_entry_node = get_CFG_FUNC_ENTRY_node(top_node)
print("[*] CFG_FUNC_RNTRY_NODE's id = ", cfg_entry_node['id'])

path = []
print("[*] The Control Flow Paths is {")
get_CFG_paths(cfg_entry_node, path)
print("}")

Results for Sample :

1
2
3
4
5
6
[*] TOPLEVEL Node =  (_1:AST {endlineno: 9, flags: ['TOPLEVEL_FILE'], id: 1, lineno: 1, name: 'lab1_sample.php', type: 'AST_TOPLEVEL'})
[*] CFG_FUNC_RNTRY_NODE's id = 2
[*] The Control Flow Paths is {
[2, 3, 4, 8]
[2, 3, 6, 8]
}

lab-1-源码-unlock-tasks.php

接下来看lab-1给出题目的源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
<?php
/**
* ---------------------------------------------------------------------
* GLPI - Gestionnaire Libre de Parc Informatique
* Copyright (C) 2015-2018 Teclib' and contributors.
*
* http://glpi-project.org
*
* based on GLPI - Gestionnaire Libre de Parc Informatique
* Copyright (C) 2003-2014 by the INDEPNET Development Team.
*
* ---------------------------------------------------------------------
*
* LICENSE
*
* This file is part of GLPI.
*
* GLPI is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* GLPI is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with GLPI. If not, see <http://www.gnu.org/licenses/>.
* ---------------------------------------------------------------------
*/

// Ensure current directory when run from crontab
chdir(__DIR__);

use Glpi\Event;

include ('../inc/includes.php');

if (isset($_SERVER['argv'])) {
for ($i=1; $i<$_SERVER['argc']; $i++) {
$it = explode("=", $_SERVER['argv'][$i], 2);
$it[0] = preg_replace('/^--/', '', $it[0]);

$_GET[$it[0]] = (isset($it[1]) ? $it[1] : true);
}
}
if (isset($_GET['cycle'])) {
$cycle = (int)$_GET['cycle'];
} else {
$cycle = 25;
}

if (isset($_GET['only_tasks'])) {
$only_tasks = explode(',', $_GET['only_tasks']);
} else {
$only_tasks = [];
}

$crontask = new Crontask();
$query = "SELECT `id`, `name`
FROM `glpi_crontasks`
WHERE `state` = '".Crontask::STATE_RUNNING."'
AND unix_timestamp(`lastrun`) + $cycle * `frequency` < unix_timestamp(now())";

//Number of unlocked tasks by the script
$unlocked_tasks = 0;

echo "Date : ".Html::convDateTime($_SESSION['glpi_currenttime'])."\n";
echo "Start unlock script\n";

foreach ($DB->request($query) as $task) {
if (!empty($only_tasks) && !in_array($task['name'], $only_tasks)) {
echo $task['name']." is still running but not in the whitelist\n";
continue;
}

$tmp['state'] = Crontask::STATE_WAITING;
$tmp['id'] = $task['id'];
if ($crontask->update($tmp)) {
$unlocked_tasks++;
$message = "Task '".$task['name']."' unlocked";
echo $message."\n";
Event::log($task['id'], 'Crontask', 5, 'Configuration', $message);
}
}
echo "Number of unlocked tasks : ".$unlocked_tasks."\n";

对用的AST图:

AST

启动neo4j图数据库之后,在浏览器中访问,先了解下CFG图的基本情况。

1
MATCH p=()-[r:FLOWS_TO]->() RETURN p LIMIT 60

粗略地一看,可以看到存在很多条Control Flows:

4

我对其做了下简单的标记:

sample

但是该CFG图中存在多个回路,因此,对于这种情况我们需要特殊处理:

5

循环结构

for循环

for

写了一段测试代码:

1
2
3
4
5
<?php
for ($i=1; $i<4; $i++) {
echo $i * 2;
}
?>

AST

对应的CPG图为:

13

抽出其中的CFG图:

14

可以看到上面也存在回路(id:16 -> id:11 -> id:21),这是for循环CFG图的一个特点。

foreach循环

foreach

可以看到,foreach和for循环的不同就在于,foreach仅能用于数组和对象,当应用于其他数据类型的时候就会报错。

相同的,如果我们自己写个程序测试下,也能发现该结构会存在一条回路。

思路分析

首先,我们要找到目标行的结点,也就是确定72行所对应的Node结点。

但是,第72行对应的不仅仅是一个结点,我们进行简单的查询:

1
MATCH (n:AST {lineno: 72}) RETURN n

结果返回了12个结点,第72行代码:

1
foreach ($DB->request($query) as $task)

对应12个node结点,其中第72行的根节点为node: { id = '187' }

7

一开始查询到结果的时候,我以为在CFG图中的lineno为72的结点是node: { id = '187' },但是,实际上,node: { id = '188' }才是CFG中的目标结点:

8

重新查看lab的要求:

实验纲要:通过所给php源码文件,利用PHPJoern进行程序分析,从文件的第一行或函数的开头开始分析,
给出到达函数 $DB->request($query)(lineno:72) 的控制流路径

题目中的要求是,要给出到达函数$DB->request($query)的控制流路径,从图上可以看出,该部分属于下图中虚线框部分:

9

而对于第72行代码:

1
foreach ($DB->request($query) as $task)

其中的$DB->request($query)属于foreach结点的子结点

所以,确定$DB->request($query)结点的具体步骤为:

  1. 先找到72行的根结点,即foreach结点

  2. 然后寻找foreach结点的子结点中满足条件的结点Target,该Target结点的子结点中可以看到下面几个结点:

    • lineno: 72 code: DB id: 190 type: string
    • lineno: 72 code: request id: 191 type: string

确定72行的根结点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def get_PARENT_OF_relationship(Node):
"""
提供一个Node结点, 返回和该node相关的 PARENT_OF Relationship list
:param Node:
:return:
"""
return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='PARENT_OF'))

def get_PARENT_OF_NODE(Node):
"""
提供一个Node结点, 返回和该node相关的 PARENT Node
:param Node:
:return:
"""
parent_of_rels_list = get_PARENT_OF_relationship(Node)
for rel in parent_of_rels_list:
if rel.end_node == Node:
return rel.start_node
return None

def get_Line_Root_Node(Node):
"""
提供一个结点,找到该节点所在行中的根节点
:param Node:
:return:
"""
if Node == None:
print("[*] The Node is None.")
return None
if Node['lineno'] == None:
print("[*] The attribute does not exists in Node.")
return
lineno = Node['lineno']
while True:
Parent_Node = get_PARENT_OF_NODE(Node)
if Parent_Node['lineno'] == None:
break
if Parent_Node['lineno'] == lineno:
Node = Parent_Node
else:
break
return Node

寻找foreach结点的子结点中满足条件的结点Target:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def get_Child_Nodes(Node):
"""
提供一个结点,通过PARENT_OF relationships来寻找该结点的child 结点,有多个
:param Node:
:return:
"""
child_node_list = []
parent_of_rels_list = get_PARENT_OF_relationship(Node)
for rel in parent_of_rels_list:
if rel.start_node == Node:
if rel.end_node not in child_node_list:
child_node_list.append(rel.end_node)
child_node_list.sort(key=lambda node: node['id'], reverse=False)
return child_node_list

def traverse_Child_Node_DFS(Node):
"""
通过深搜DFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
:param Node:
:return:
"""
if Node['code'] == 'DB' or Node['code'] == 'request':
return True
child_node_list = get_Child_Nodes(Node)
for child_node in child_node_list:
flag = traverse_Child_Node_DFS(child_node)
if flag == True:
return True
return False

def traverse_Child_Node_BFS(Node):
"""
通过广搜BFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
:param Node:
:return:
"""
import queue

que = queue.Queue()
vis_nodes = []
que.put(Node)
while que.qsize() != 0:
top_node = que.get()
if top_node not in vis_nodes:
vis_nodes.append(top_node)
if top_node['code'] == 'DB' or top_node['code'] == 'request':
return True
child_node_list = get_Child_Nodes(top_node)
for child_node in child_node_list:
if child_node in vis_nodes:
continue
que.put(child_node)
vis_nodes.append(child_node)
return False


def get_Target_Child_Node(Node):
"""
根据提供的根节点(即foreach结点),确定$DB->request($query)函数的根结点
:param Node:
:return:
"""
child_node_list = get_Child_Nodes(Node)
for child_node in child_node_list:
flag = traverse_Child_Node_DFS(child_node)
# flag = traverse_Child_Node_BFS(child_node)
if flag == True:
return child_node
return None

但这里我还有一个疑惑,为什么是node: { id = '188' }在CFG图上,而不是node: { id = '187' }

我写了个程序验证下:

1
2
3
4
5
6
<?php
$arr = array(1, 2, 3, 4);
foreach ($arr as $value) {
echo $value * 2;
}
?>

对应的CPG图为:

10

同样的,在其中类型为AST_FOREACH类型的结点同样不在CFG图上:

11

对应的CFG图为:

12

同样可以看到存在环路。

完整程序

因为我们要找到所有可能的路径,所以通过深搜来遍历CFG图获得控制流路径:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def get_FLOWS_TO_relationship(Node):
"""
提供一个Node结点, 返回和该node相关的 FLOWS_TO Relationship list
:param Node:
:return:
"""
return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='FLOWS_TO'))


def get_CFG_Child_Nodes_List(Node):
"""
提供一个Node结点,找到Node结点在CFG图中的子结点,应存在多个
:param Node:
:return:
"""
cfg_child_node_list = []
cfg_flows_to_rels = get_FLOWS_TO_relationship(Node)
for rel in cfg_flows_to_rels:
if rel.start_node == Node:
if rel.end_node not in cfg_child_node_list:
cfg_child_node_list.append(rel.end_node)
cfg_child_node_list.sort(key=lambda node: node['id'], reverse=False)
return cfg_child_node_list

def print_CFG_PathList(PathList, CycleDict):
"""
用于输出CFG路径,同时需要对循环涉及的路径进行特殊处理
:param PathList:
:param CycleDict:
:return:
: 存在循环的情况
: PathList = [34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
: CycleDict = {41: [42, 43, 45]}
要将后一个重复的41替换为[42, 43, 45]
但是要注意,当前的PathList = [..., 41, 41, 48, ......]也是正确的CFG路径
"""
print(PathList)

flag = True # 判断是否经过了循环路径
final_PathList = list(set(PathList))
if len(final_PathList) == len(PathList):
flag = False
final_PathList.sort(key=PathList.index)
for key in sorted(CycleDict.keys()):
if key in final_PathList:
idx = PathList.index(key) # 确定后一个重复的lineno的位置,进行替换
final_PathList[idx + 1:idx + 1] = iter(CycleDict[key])
if flag == True: # 如果没有经过循环,那么不用输出final_PathList,已经输出过PathList了
print(final_PathList)

def get_CFG_Path_DFS(StartNode, EndNode, PathList=[], CycleDict={}):
"""
通过深搜来寻找CFG路径,当遍历到结束结点的时候就进行一次路径输出
:param StartNode: 遍历的起始Node结点
:param EndNode: 遍历的结束Node结点
:param PathList: 存储CFG路径
:param CycleDict: CFG图中存在的循环路径,需要特殊处理
:return:
"""
PathList.append(StartNode['lineno'])
if StartNode == EndNode:
print_CFG_PathList(PathList, CycleDict)
else:
CFG_Child_Node_List = get_CFG_Child_Nodes_List(StartNode)
for CFG_Child_Node in CFG_Child_Node_List:
# 情况1:绝大多数情况下,CFG的当前结点lineno都不会小于其后续结点的lineno
if CFG_Child_Node['lineno'] >= StartNode['lineno']:
get_CFG_Path_DFS(CFG_Child_Node, EndNode, PathList, CycleDict)
# 情况2:在出现循环的情况下,当结点对应循环中的最后一行语句时,其后续子结点会回到循环体的初始行,所以会出现lineno变小的情况
else:
# CFG_Child_Node['lineno'] < StartNode['lineno']
# 举例:41 -> 42 -> 43 -> 45(child_lineno) -> 41
child_lineno = CFG_Child_Node['lineno'] # 相当于上面的41
i = 0
while PathList[i] != child_lineno:
i = i + 1
i = i + 1
CycleDict[child_lineno] = PathList[i + 1:]
PathList.pop()

将上述的分析流程串起来,最后得到的程序为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
from config import graph
import py2neo

def get_AST_node_by_type(type):
"""
提供AST Type,获取AST结点
:param lineno:
:return: 返回第一个Node结点
"""
Node_list = list(graph.nodes.match("AST", type=type))
if len(Node_list) == 0:
return None
return Node_list[0]


def get_CFG_FUNC_ENTRY_relationships(Node):
"""
寻找CFG的ENTRY结点,先得到relationships
:param Node:
:return:
"""
return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='ENTRY'))

def get_CFG_FUNC_ENTRY_node(Node):
"""
寻找CFG的入口结点,只有一个
:param Node:
:return:
"""
entry_relationship = get_CFG_FUNC_ENTRY_relationships(Node)
for rel in entry_relationship:
if rel in entry_relationship:
if rel.start_node == Node:
return rel.end_node
return None

def get_PARENT_OF_relationship(Node):
"""
提供一个Node结点, 返回和该node相关的 PARENT_OF Relationship list
:param Node:
:return:
"""
return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='PARENT_OF'))

def get_PARENT_OF_NODE(Node):
"""
提供一个Node结点, 返回和该node相关的 PARENT_OF Relationship list
:param Node:
:return:
"""
parent_of_rels_list = get_PARENT_OF_relationship(Node)
for rel in parent_of_rels_list:
if rel.end_node == Node:
return rel.start_node
return None

def get_AST_node_by_lineno(lineno):
"""
提供lineno,返回AST 结点,有多个
:param lineno:
:return:
"""
Node_list = list(graph.nodes.match("AST", lineno=lineno))
return Node_list[0]

def get_Line_Root_Node(Node):
"""
提供一个结点,找到该节点所在行中的根节点
:param Node:
:return:
"""
if Node == None:
print("[*] The Node is None.")
return None
if Node['lineno'] == None:
print("[*] The attribute does not exists in Node.")
return
lineno = Node['lineno']
while True:
Parent_Node = get_PARENT_OF_NODE(Node)
if Parent_Node['lineno'] == None:
break
if Parent_Node['lineno'] == lineno:
Node = Parent_Node
else:
break
return Node

def get_Child_Nodes(Node):
"""
提供一个结点,通过PARENT_OF relationships来寻找该结点的child 结点,有多个
:param Node:
:return:
"""
child_node_list = []
parent_of_rels_list = get_PARENT_OF_relationship(Node)
for rel in parent_of_rels_list:
if rel.start_node == Node:
if rel.end_node not in child_node_list:
child_node_list.append(rel.end_node)
child_node_list.sort(key=lambda node: node['id'], reverse=False)
return child_node_list

def traverse_Child_Node_DFS(Node):
"""
通过深搜DFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
:param Node:
:return:
"""
if Node['code'] == 'DB' or Node['code'] == 'request':
return True
child_node_list = get_Child_Nodes(Node)
for child_node in child_node_list:
flag = traverse_Child_Node_DFS(child_node)
if flag == True:
return True
return False

def traverse_Child_Node_BFS(Node):
"""
通过广搜BFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
:param Node:
:return:
"""
import queue

que = queue.Queue()
vis_nodes = []
que.put(Node)
while que.qsize() != 0:
top_node = que.get()
if top_node not in vis_nodes:
vis_nodes.append(top_node)
if top_node['code'] == 'DB' or top_node['code'] == 'request':
return True
child_node_list = get_Child_Nodes(top_node)
for child_node in child_node_list:
if child_node in vis_nodes:
continue
que.put(child_node)
vis_nodes.append(child_node)
return False


def get_Target_Child_Node(Node):
"""
根据提供的根节点(即foreach结点),确定$DB->request($query)函数的根结点
:param Node:
:return:
"""
child_node_list = get_Child_Nodes(Node)
for child_node in child_node_list:
flag = traverse_Child_Node_BFS(child_node)
if flag == True:
return child_node
return None


def get_FLOWS_TO_relationship(Node):
"""
提供一个Node结点, 返回和该node相关的 FLOWS_TO Relationship list
:param Node:
:return:
"""
return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='FLOWS_TO'))


def get_CFG_Child_Nodes_List(Node):
"""
提供一个Node结点,找到Node结点在CFG图中的子结点,应存在多个
:param Node:
:return:
"""
cfg_child_node_list = []
cfg_flows_to_rels = get_FLOWS_TO_relationship(Node)
for rel in cfg_flows_to_rels:
if rel.start_node == Node:
if rel.end_node not in cfg_child_node_list:
cfg_child_node_list.append(rel.end_node)
cfg_child_node_list.sort(key=lambda node: node['id'], reverse=False)
return cfg_child_node_list

def print_CFG_PathList(PathList, CycleDict):
"""
用于输出CFG路径,同时需要对循环涉及的路径进行特殊处理
:param PathList:
:param CycleDict:
:return:
: 存在循环的情况
: PathList = [34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
: CycleDict = {41: [42, 43, 45]}
要将后一个重复的41替换为[42, 43, 45]
但是要注意,当前的PathList = [..., 41, 41, 48, ......]也是正确的CFG路径
"""
print(PathList)

flag = True # 判断是否经过了循环路径
final_PathList = list(set(PathList))
if len(final_PathList) == len(PathList):
flag = False
final_PathList.sort(key=PathList.index)
for key in sorted(CycleDict.keys()):
if key in final_PathList:
idx = PathList.index(key) # 确定后一个重复的lineno的位置,进行替换
final_PathList[idx + 1:idx + 1] = iter(CycleDict[key])
if flag == True: # 如果没有经过循环,那么不用输出final_PathList,已经输出过PathList了
print(final_PathList)

def get_CFG_Path_DFS(StartNode, EndNode, PathList=[], CycleDict={}):
"""
通过深搜来寻找CFG路径,当遍历到结束结点的时候就进行一次路径输出
:param StartNode: 遍历的起始Node结点
:param EndNode: 遍历的结束Node结点
:param PathList: 存储CFG路径
:param CycleDict: CFG图中存在的循环路径,需要特殊处理
:return:
"""
PathList.append(StartNode['lineno'])
if StartNode == EndNode:
print_CFG_PathList(PathList, CycleDict)
else:
CFG_Child_Node_List = get_CFG_Child_Nodes_List(StartNode)
for CFG_Child_Node in CFG_Child_Node_List:
# 情况1:绝大多数情况下,CFG的当前结点lineno都不会小于其后续结点的lineno
if CFG_Child_Node['lineno'] >= StartNode['lineno']:
get_CFG_Path_DFS(CFG_Child_Node, EndNode, PathList, CycleDict)
# 情况2:在出现循环的情况下,当结点对应循环中的最后一行语句时,其后续子结点会回到循环体的初始行,所以会出现lineno变小的情况
else:
# CFG_Child_Node['lineno'] < StartNode['lineno']
# 举例:41 -> 42 -> 43 -> 45(child_lineno) -> 41
child_lineno = CFG_Child_Node['lineno'] # 相当于上面的41
i = 0
while PathList[i] != child_lineno:
i = i + 1
i = i + 1
CycleDict[child_lineno] = PathList[i + 1:]
PathList.pop()

if __name__ == "__main__":
start_lineno = 1
end_lineno = 72

# 从文件的起始行,也就是第一个Node开始遍历
start_node_type = "AST_TOPLEVEL" # 文件第一行的AST Type为 AST_TOPLEVEL
start_node = get_AST_node_by_type(start_node_type)

# 确定CFG的入口结点
cfg_entry_node = get_CFG_FUNC_ENTRY_node(start_node)
# cfg_start_node = get_Child_Nodes(cfg_entry_node)[0]
cfg_start_node = graph.nodes[5]
print("[*] The Entry Node = ", cfg_start_node)


# 确定第72行对应的Node结点
temp_node = get_AST_node_by_lineno(end_lineno)
temp_root_node = get_Line_Root_Node(temp_node)
cfg_target_node = get_Target_Child_Node(temp_root_node)
print("[*] The Target Node = ", cfg_target_node)

get_CFG_Path_DFS(cfg_start_node, cfg_target_node)

输出结果为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[*] The Entry Node =  (_5:AST {childnum: 0, funcid: 1, id: 5, lineno: 34, type: 'AST_CALL'})
[*] The Target Node = (_188:AST {childnum: 0, funcid: 1, id: 188, lineno: 72, type: 'AST_METHOD_CALL'})
[34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 41, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 41, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 41, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink flow结果

最后一共找到了8条sink flow,依次如下图所示。

sink flow 1:[34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-1

sink flow 2:[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-1

sink flow 3:[34, 36, 38, 40, 41, 41, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-2

sink flow 4:[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-2

sink flow 5:[34, 36, 38, 40, 41, 41, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-5

sink flow 6:[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-3

sink flow 7:[34, 36, 38, 40, 41, 41, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-8

sink flow 8:[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-8

sink flow 9:[34, 36, 38, 40, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-5

sink flow 10:[34, 36, 38, 40, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-6

sink flow 11:[34, 36, 38, 40, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-7

sink flow 12:[34, 36, 38, 40, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-8

一个小问题

上面找到的sink flow中,其实还存在一些不太完美的地方,就是使用一条数据流边替代了控制流边。即上面id:74 -> id:97的边,实际上的准确控制流的路径应该是id:74 -> id:41 -> id:41 -> id:97,这是因为在遍历的时候存储的是lineno,而不是直接存储的结点,如果是存储的结点,那么其实是可以找到id:74 -> id:97的路径的。不过这其实没有什么特别大的关系,这里只是记录下存在这么个问题。