PHP程序分析学习-控制流分析入门

前言

lab-1 是控制流分析入门。

lab-1-sample

实验纲要：通过所给php源码文件，利用PHPJoern进行程序分析，从文件的第一行或函数的开头开始分析，
给出到达函数 $DB->request($query)(lineno:72) 的控制流路径

Sample

<?php
$a=1;
if (isset($_GET['only_tasks'])) {
   $only_tasks = explode(',', $_GET['only_tasks']);
} else {
   $only_tasks = [];
}
$crontask = new Crontask();
?>

Sample Input:

script

(2,8)

Sample Output:

script

1 2	[2,3,4,8] [2,3,6,8]

note

到达该函数的控制流语句应当有多条。

为sample case生成CPG图，并导入neo4j数据库分析结点信息。

在CPG图中存在一个起始结点，该结点通常包含一些其他结点不存在的属性，如同时存在lineno和endlineno，包含文件名，node的类型为AST_TOPLEVEL。

该起始结点如下图所示，属性为：

lineno : 1

endlineno : 9

flags : TOPLEVEL_FILE

name : lab1_sample.php

id : 1

type : AST_TOPLEVEL

控制流路径如下图绿色部分实线所示：

当然我们也可以直接通过查询语句查看CFG，就可以很清晰地看到存在两条控制流路径：

1	MATCH p=()-[r:FLOWS_TO]->() RETURN p LIMIT 25

Python Script for Sample :

config.py：

# config.py
import py2neo

neo4j_host = "10.176.36.21"
neo4j_username = "neo4j"
neo4j_password = "123"
port = '36888'
graph = py2neo.Graph(host=neo4j_host, port=port, auth=(neo4j_username, neo4j_password))

main.py：

# main.py
from config import graph
import py2neo


def get_CFG_FUNC_ENTRY_relationships(Node):
    """
    寻找CFG的ENRTY结点，先得到relationship
    :param Node:
    :return:
    """
    return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='ENTRY'))

def get_CFG_FUNC_ENTRY_node(Node):
    """
    寻找CFG的入口结点，只有一个
    :param Node:
    :return:
    """
    entry_relationship = get_CFG_FUNC_ENTRY_relationships(Node)
    for rel in entry_relationship:
        if rel.start_node == Node:
            return rel.end_node
    return None


def get_FLOWS_TO_relationships(Node):
    """
    提供一个Node结点, 返回和该node相关的 FLOWS_TO Relationship list
    :param Node:
    :return:
    """
    return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='FLOWS_TO'))

def get_FLOWS_TO_nodes(Node):
    """
    提供一个Node结点，寻找其CFG图中的子结点
    :param Node:
    :return:
    """
    cfg_child_nodes = []
    flows_to_rels = get_FLOWS_TO_relationships(Node)
    for node_item in flows_to_rels:
        if node_item.start_node == Node:
            if node_item not in cfg_child_nodes:
                cfg_child_nodes.append(node_item.end_node)
    cfg_child_nodes.sort(key=lambda node: node['id'], reverse=False)
    return cfg_child_nodes


def get_CFG_paths(Node, path):
    """
    通过深搜来遍历CFG路径
    :param Node: CFG ENTRY Node
    :param path: 控制流路径
    :param path_num:    控制流条数
    :return:
    """
    if Node['type'] == 'NULL':
        print("\t", path[:-1])

        return
    # path = []
    child_nodes = get_FLOWS_TO_nodes(Node)
    for child in child_nodes:
        path.append(child['lineno'])
        get_CFG_paths(child, path)
        path.pop()


if __name__ == '__main__':
    top_node = graph.nodes[1]
    print("[*] TOPLEVEL Node = ", top_node)

    cfg_entry_node = get_CFG_FUNC_ENTRY_node(top_node)
    print("[*] CFG_FUNC_RNTRY_NODE's id = ", cfg_entry_node['id'])

    path = []
    print("[*] The Control Flow Paths is {")
    get_CFG_paths(cfg_entry_node, path)
    print("}")

Results for Sample :

[*] TOPLEVEL Node =  (_1:AST {endlineno: 9, flags: ['TOPLEVEL_FILE'], id: 1, lineno: 1, name: 'lab1_sample.php', type: 'AST_TOPLEVEL'})
[*] CFG_FUNC_RNTRY_NODE's id =  2
[*] The Control Flow Paths is {
	 [2, 3, 4, 8]
	 [2, 3, 6, 8]
}

lab-1-源码-unlock-tasks.php

接下来看lab-1给出题目的源码：

<?php
/**
 * ---------------------------------------------------------------------
 * GLPI - Gestionnaire Libre de Parc Informatique
 * Copyright (C) 2015-2018 Teclib' and contributors.
 *
 * http://glpi-project.org
 *
 * based on GLPI - Gestionnaire Libre de Parc Informatique
 * Copyright (C) 2003-2014 by the INDEPNET Development Team.
 *
 * ---------------------------------------------------------------------
 *
 * LICENSE
 *
 * This file is part of GLPI.
 *
 * GLPI is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * GLPI is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with GLPI. If not, see <http://www.gnu.org/licenses/>.
 * ---------------------------------------------------------------------
 */

// Ensure current directory when run from crontab
chdir(__DIR__);

use Glpi\Event;

include ('../inc/includes.php');

if (isset($_SERVER['argv'])) {
   for ($i=1; $i<$_SERVER['argc']; $i++) {
      $it    = explode("=", $_SERVER['argv'][$i], 2);
      $it[0] = preg_replace('/^--/', '', $it[0]);

      $_GET[$it[0]] = (isset($it[1]) ? $it[1] : true);
   }
}
if (isset($_GET['cycle'])) {
   $cycle = (int)$_GET['cycle'];
} else {
   $cycle = 25;
}

if (isset($_GET['only_tasks'])) {
   $only_tasks = explode(',', $_GET['only_tasks']);
} else {
   $only_tasks = [];
}

$crontask = new Crontask();
$query    = "SELECT `id`, `name`
             FROM `glpi_crontasks`
             WHERE `state` = '".Crontask::STATE_RUNNING."'
                   AND unix_timestamp(`lastrun`) + $cycle * `frequency` < unix_timestamp(now())";

//Number of unlocked tasks by the script
$unlocked_tasks = 0;

echo "Date : ".Html::convDateTime($_SESSION['glpi_currenttime'])."\n";
echo "Start unlock script\n";

foreach ($DB->request($query) as $task) {
   if (!empty($only_tasks) && !in_array($task['name'], $only_tasks)) {
      echo $task['name']." is still running but not in the whitelist\n";
      continue;
   }

   $tmp['state'] = Crontask::STATE_WAITING;
   $tmp['id']    = $task['id'];
   if ($crontask->update($tmp)) {
      $unlocked_tasks++;
      $message = "Task '".$task['name']."' unlocked";
      echo $message."\n";
      Event::log($task['id'], 'Crontask', 5, 'Configuration', $message);
   }
}
echo "Number of unlocked tasks : ".$unlocked_tasks."\n";

对用的AST图：

AST

启动neo4j图数据库之后，在浏览器中访问，先了解下CFG图的基本情况。

1	MATCH p=()-[r:FLOWS_TO]->() RETURN p LIMIT 60

粗略地一看，可以看到存在很多条Control Flows：

我对其做了下简单的标记：

sample

但是该CFG图中存在多个回路，因此，对于这种情况我们需要特殊处理：

循环结构

for循环

for

写了一段测试代码：

<?php
for ($i=1; $i<4; $i++) {
        echo $i * 2;
}
?>

AST

对应的CPG图为：

抽出其中的CFG图：

可以看到上面也存在回路（id:16 -> id:11 -> id:21），这是for循环CFG图的一个特点。

foreach循环

foreach

可以看到，foreach和for循环的不同就在于，foreach仅能用于数组和对象，当应用于其他数据类型的时候就会报错。

相同的，如果我们自己写个程序测试下，也能发现该结构会存在一条回路。

思路分析

首先，我们要找到目标行的结点，也就是确定72行所对应的Node结点。

但是，第72行对应的不仅仅是一个结点，我们进行简单的查询：

1	MATCH (n:AST {lineno: 72}) RETURN n

结果返回了12个结点，第72行代码：

1	foreach ($DB->request($query) as $task)

对应12个node结点，其中第72行的根节点为node: { id = '187' }：

一开始查询到结果的时候，我以为在CFG图中的lineno为72的结点是node: { id = '187' }，但是，实际上，node: { id = '188' }才是CFG中的目标结点：

重新查看lab的要求：

实验纲要：通过所给php源码文件，利用PHPJoern进行程序分析，从文件的第一行或函数的开头开始分析，
给出到达函数 $DB->request($query)(lineno:72) 的控制流路径

题目中的要求是，要给出到达函数$DB->request($query)的控制流路径，从图上可以看出，该部分属于下图中虚线框部分：

而对于第72行代码：

1	foreach ($DB->request($query) as $task)

其中的$DB->request($query)属于foreach结点的子结点。

所以，确定$DB->request($query)结点的具体步骤为：

先找到72行的根结点，即foreach结点
然后寻找foreach结点的子结点中满足条件的结点Target，该Target结点的子结点中可以看到下面几个结点：
- lineno: 72 code: DB id: 190 type: string
- lineno: 72 code: request id: 191 type: string

确定72行的根结点：

def get_PARENT_OF_relationship(Node):
    """
    提供一个Node结点, 返回和该node相关的 PARENT_OF Relationship list
    :param Node:
    :return:
    """
    return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='PARENT_OF'))

def get_PARENT_OF_NODE(Node):
    """
    提供一个Node结点, 返回和该node相关的 PARENT Node
    :param Node:
    :return:
    """
    parent_of_rels_list = get_PARENT_OF_relationship(Node)
    for rel in parent_of_rels_list:
        if rel.end_node == Node:
            return rel.start_node
    return None

def get_Line_Root_Node(Node):
    """
    提供一个结点，找到该节点所在行中的根节点
    :param Node:
    :return:
    """
    if Node == None:
        print("[*] The Node is None.")
        return None
    if Node['lineno'] == None:
        print("[*] The attribute does not exists in Node.")
        return
    lineno = Node['lineno']
    while True:
        Parent_Node = get_PARENT_OF_NODE(Node)
        if Parent_Node['lineno'] == None:
            break
        if Parent_Node['lineno'] == lineno:
            Node = Parent_Node
        else:
            break
    return Node

寻找foreach结点的子结点中满足条件的结点Target：

def get_Child_Nodes(Node):
    """
    提供一个结点，通过PARENT_OF relationships来寻找该结点的child 结点，有多个
    :param Node:
    :return:
    """
    child_node_list = []
    parent_of_rels_list = get_PARENT_OF_relationship(Node)
    for rel in parent_of_rels_list:
        if rel.start_node == Node:
            if rel.end_node not in child_node_list:
                child_node_list.append(rel.end_node)
    child_node_list.sort(key=lambda node: node['id'], reverse=False)
    return child_node_list

def traverse_Child_Node_DFS(Node):
    """
    通过深搜DFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
    :param Node:
    :return:
    """
    if Node['code'] == 'DB' or Node['code'] == 'request':
        return True
    child_node_list = get_Child_Nodes(Node)
    for child_node in child_node_list:
        flag = traverse_Child_Node_DFS(child_node)
        if flag == True:
            return True
    return False

def traverse_Child_Node_BFS(Node):
    """
    通过广搜BFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
    :param Node:
    :return:
    """
    import queue

    que = queue.Queue()
    vis_nodes = []
    que.put(Node)
    while que.qsize() != 0:
        top_node = que.get()
        if top_node not in vis_nodes:
          vis_nodes.append(top_node)
        if top_node['code'] == 'DB' or top_node['code'] == 'request':
            return True
        child_node_list = get_Child_Nodes(top_node)
        for child_node in child_node_list:
            if child_node in vis_nodes:
                continue
            que.put(child_node)
            vis_nodes.append(child_node)
    return False


def get_Target_Child_Node(Node):
    """
    根据提供的根节点（即foreach结点），确定$DB->request($query)函数的根结点
    :param Node:
    :return:
    """
    child_node_list = get_Child_Nodes(Node)
    for child_node in child_node_list:
        flag = traverse_Child_Node_DFS(child_node)
        # flag = traverse_Child_Node_BFS(child_node)
        if flag == True:
            return child_node
    return None

但这里我还有一个疑惑，为什么是node: { id = '188' }在CFG图上，而不是node: { id = '187' }？

我写了个程序验证下：

<?php
$arr = array(1, 2, 3, 4);
foreach ($arr as $value) {
	echo $value * 2;
}
?>

对应的CPG图为：

同样的，在其中类型为AST_FOREACH类型的结点同样不在CFG图上：

对应的CFG图为：

同样可以看到存在环路。

完整程序

因为我们要找到所有可能的路径，所以通过深搜来遍历CFG图获得控制流路径：

def get_FLOWS_TO_relationship(Node):
    """
    提供一个Node结点, 返回和该node相关的 FLOWS_TO Relationship list
    :param Node:
    :return:
    """
    return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='FLOWS_TO'))


def get_CFG_Child_Nodes_List(Node):
    """
    提供一个Node结点，找到Node结点在CFG图中的子结点，应存在多个
    :param Node:
    :return:
    """
    cfg_child_node_list = []
    cfg_flows_to_rels = get_FLOWS_TO_relationship(Node)
    for rel in cfg_flows_to_rels:
        if rel.start_node == Node:
            if rel.end_node not in cfg_child_node_list:
                cfg_child_node_list.append(rel.end_node)
    cfg_child_node_list.sort(key=lambda node: node['id'], reverse=False)
    return cfg_child_node_list

def print_CFG_PathList(PathList, CycleDict):
    """
    用于输出CFG路径，同时需要对循环涉及的路径进行特殊处理
    :param PathList:
    :param CycleDict:
    :return:
    : 存在循环的情况
    : PathList = [34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
    : CycleDict = {41: [42, 43, 45]}
    要将后一个重复的41替换为[42, 43, 45]
    但是要注意，当前的PathList = [..., 41, 41, 48, ......]也是正确的CFG路径
    """
    print(PathList)

    flag = True # 判断是否经过了循环路径
    final_PathList = list(set(PathList))
    if len(final_PathList) == len(PathList):
        flag = False
    final_PathList.sort(key=PathList.index)
    for key in sorted(CycleDict.keys()):
        if key in final_PathList:
            idx = PathList.index(key)   # 确定后一个重复的lineno的位置，进行替换
            final_PathList[idx + 1:idx + 1] = iter(CycleDict[key])
    if flag == True:    # 如果没有经过循环，那么不用输出final_PathList，已经输出过PathList了
        print(final_PathList)

def get_CFG_Path_DFS(StartNode, EndNode, PathList=[], CycleDict={}):
    """
    通过深搜来寻找CFG路径，当遍历到结束结点的时候就进行一次路径输出
    :param StartNode: 遍历的起始Node结点
    :param EndNode: 遍历的结束Node结点
    :param PathList: 存储CFG路径
    :param CycleDict: CFG图中存在的循环路径，需要特殊处理
    :return:
    """
    PathList.append(StartNode['lineno'])
    if StartNode == EndNode:
        print_CFG_PathList(PathList, CycleDict)
    else:
        CFG_Child_Node_List = get_CFG_Child_Nodes_List(StartNode)
        for CFG_Child_Node in CFG_Child_Node_List:
            # 情况1：绝大多数情况下，CFG的当前结点lineno都不会小于其后续结点的lineno
            if CFG_Child_Node['lineno'] >= StartNode['lineno']:
                get_CFG_Path_DFS(CFG_Child_Node, EndNode, PathList, CycleDict)
            # 情况2：在出现循环的情况下，当结点对应循环中的最后一行语句时，其后续子结点会回到循环体的初始行，所以会出现lineno变小的情况
            else:
                # CFG_Child_Node['lineno'] < StartNode['lineno']
                # 举例：41 -> 42 -> 43 -> 45(child_lineno) -> 41
                child_lineno = CFG_Child_Node['lineno'] # 相当于上面的41
                i = 0
                while PathList[i] != child_lineno:
                    i = i + 1
                i = i + 1
                CycleDict[child_lineno] = PathList[i + 1:]
    PathList.pop()

将上述的分析流程串起来，最后得到的程序为：

from config import graph
import py2neo

def get_AST_node_by_type(type):
    """
    提供AST Type，获取AST结点
    :param lineno:
    :return: 返回第一个Node结点
    """
    Node_list = list(graph.nodes.match("AST", type=type))
    if len(Node_list) == 0:
        return None
    return Node_list[0]


def get_CFG_FUNC_ENTRY_relationships(Node):
    """
    寻找CFG的ENTRY结点，先得到relationships
    :param Node:
    :return:
    """
    return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='ENTRY'))

def get_CFG_FUNC_ENTRY_node(Node):
    """
    寻找CFG的入口结点，只有一个
    :param Node:
    :return:
    """
    entry_relationship = get_CFG_FUNC_ENTRY_relationships(Node)
    for rel in entry_relationship:
        if rel in entry_relationship:
            if rel.start_node == Node:
                return rel.end_node
    return None

def get_PARENT_OF_relationship(Node):
    """
    提供一个Node结点, 返回和该node相关的 PARENT_OF Relationship list
    :param Node:
    :return:
    """
    return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='PARENT_OF'))

def get_PARENT_OF_NODE(Node):
    """
    提供一个Node结点, 返回和该node相关的 PARENT_OF Relationship list
    :param Node:
    :return:
    """
    parent_of_rels_list = get_PARENT_OF_relationship(Node)
    for rel in parent_of_rels_list:
        if rel.end_node == Node:
            return rel.start_node
    return None

def get_AST_node_by_lineno(lineno):
    """
    提供lineno，返回AST 结点，有多个
    :param lineno:
    :return:
    """
    Node_list = list(graph.nodes.match("AST", lineno=lineno))
    return Node_list[0]

def get_Line_Root_Node(Node):
    """
    提供一个结点，找到该节点所在行中的根节点
    :param Node:
    :return:
    """
    if Node == None:
        print("[*] The Node is None.")
        return None
    if Node['lineno'] == None:
        print("[*] The attribute does not exists in Node.")
        return
    lineno = Node['lineno']
    while True:
        Parent_Node = get_PARENT_OF_NODE(Node)
        if Parent_Node['lineno'] == None:
            break
        if Parent_Node['lineno'] == lineno:
            Node = Parent_Node
        else:
            break
    return Node

def get_Child_Nodes(Node):
    """
    提供一个结点，通过PARENT_OF relationships来寻找该结点的child 结点，有多个
    :param Node:
    :return:
    """
    child_node_list = []
    parent_of_rels_list = get_PARENT_OF_relationship(Node)
    for rel in parent_of_rels_list:
        if rel.start_node == Node:
            if rel.end_node not in child_node_list:
                child_node_list.append(rel.end_node)
    child_node_list.sort(key=lambda node: node['id'], reverse=False)
    return child_node_list

def traverse_Child_Node_DFS(Node):
    """
    通过深搜DFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
    :param Node:
    :return:
    """
    if Node['code'] == 'DB' or Node['code'] == 'request':
        return True
    child_node_list = get_Child_Nodes(Node)
    for child_node in child_node_list:
        flag = traverse_Child_Node_DFS(child_node)
        if flag == True:
            return True
    return False

def traverse_Child_Node_BFS(Node):
    """
    通过广搜BFS来确定当前Node结点或是子结点是否存在`$DB->request()`函数标志
    :param Node:
    :return:
    """
    import queue

    que = queue.Queue()
    vis_nodes = []
    que.put(Node)
    while que.qsize() != 0:
        top_node = que.get()
        if top_node not in vis_nodes:
          vis_nodes.append(top_node)
        if top_node['code'] == 'DB' or top_node['code'] == 'request':
            return True
        child_node_list = get_Child_Nodes(top_node)
        for child_node in child_node_list:
            if child_node in vis_nodes:
                continue
            que.put(child_node)
            vis_nodes.append(child_node)
    return False


def get_Target_Child_Node(Node):
    """
    根据提供的根节点（即foreach结点），确定$DB->request($query)函数的根结点
    :param Node:
    :return:
    """
    child_node_list = get_Child_Nodes(Node)
    for child_node in child_node_list:
        flag = traverse_Child_Node_BFS(child_node)
        if flag == True:
            return child_node
    return None


def get_FLOWS_TO_relationship(Node):
    """
    提供一个Node结点, 返回和该node相关的 FLOWS_TO Relationship list
    :param Node:
    :return:
    """
    return list(py2neo.RelationshipMatch(graph, nodes={Node}, r_type='FLOWS_TO'))


def get_CFG_Child_Nodes_List(Node):
    """
    提供一个Node结点，找到Node结点在CFG图中的子结点，应存在多个
    :param Node:
    :return:
    """
    cfg_child_node_list = []
    cfg_flows_to_rels = get_FLOWS_TO_relationship(Node)
    for rel in cfg_flows_to_rels:
        if rel.start_node == Node:
            if rel.end_node not in cfg_child_node_list:
                cfg_child_node_list.append(rel.end_node)
    cfg_child_node_list.sort(key=lambda node: node['id'], reverse=False)
    return cfg_child_node_list

def print_CFG_PathList(PathList, CycleDict):
    """
    用于输出CFG路径，同时需要对循环涉及的路径进行特殊处理
    :param PathList:
    :param CycleDict:
    :return:
    : 存在循环的情况
    : PathList = [34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
    : CycleDict = {41: [42, 43, 45]}
    要将后一个重复的41替换为[42, 43, 45]
    但是要注意，当前的PathList = [..., 41, 41, 48, ......]也是正确的CFG路径
    """
    print(PathList)

    flag = True # 判断是否经过了循环路径
    final_PathList = list(set(PathList))
    if len(final_PathList) == len(PathList):
        flag = False
    final_PathList.sort(key=PathList.index)
    for key in sorted(CycleDict.keys()):
        if key in final_PathList:
            idx = PathList.index(key)   # 确定后一个重复的lineno的位置，进行替换
            final_PathList[idx + 1:idx + 1] = iter(CycleDict[key])
    if flag == True:    # 如果没有经过循环，那么不用输出final_PathList，已经输出过PathList了
        print(final_PathList)

def get_CFG_Path_DFS(StartNode, EndNode, PathList=[], CycleDict={}):
    """
    通过深搜来寻找CFG路径，当遍历到结束结点的时候就进行一次路径输出
    :param StartNode: 遍历的起始Node结点
    :param EndNode: 遍历的结束Node结点
    :param PathList: 存储CFG路径
    :param CycleDict: CFG图中存在的循环路径，需要特殊处理
    :return:
    """
    PathList.append(StartNode['lineno'])
    if StartNode == EndNode:
        print_CFG_PathList(PathList, CycleDict)
    else:
        CFG_Child_Node_List = get_CFG_Child_Nodes_List(StartNode)
        for CFG_Child_Node in CFG_Child_Node_List:
            # 情况1：绝大多数情况下，CFG的当前结点lineno都不会小于其后续结点的lineno
            if CFG_Child_Node['lineno'] >= StartNode['lineno']:
                get_CFG_Path_DFS(CFG_Child_Node, EndNode, PathList, CycleDict)
            # 情况2：在出现循环的情况下，当结点对应循环中的最后一行语句时，其后续子结点会回到循环体的初始行，所以会出现lineno变小的情况
            else:
                # CFG_Child_Node['lineno'] < StartNode['lineno']
                # 举例：41 -> 42 -> 43 -> 45(child_lineno) -> 41
                child_lineno = CFG_Child_Node['lineno'] # 相当于上面的41
                i = 0
                while PathList[i] != child_lineno:
                    i = i + 1
                i = i + 1
                CycleDict[child_lineno] = PathList[i + 1:]
    PathList.pop()

if __name__ == "__main__":
    start_lineno = 1
    end_lineno = 72

    # 从文件的起始行，也就是第一个Node开始遍历
    start_node_type = "AST_TOPLEVEL"    # 文件第一行的AST Type为 AST_TOPLEVEL
    start_node = get_AST_node_by_type(start_node_type)

    # 确定CFG的入口结点
    cfg_entry_node = get_CFG_FUNC_ENTRY_node(start_node)
    # cfg_start_node = get_Child_Nodes(cfg_entry_node)[0]
    cfg_start_node = graph.nodes[5]
    print("[*] The Entry Node = ", cfg_start_node)


    # 确定第72行对应的Node结点
    temp_node = get_AST_node_by_lineno(end_lineno)
    temp_root_node = get_Line_Root_Node(temp_node)
    cfg_target_node = get_Target_Child_Node(temp_root_node)
    print("[*] The Target Node = ", cfg_target_node)

    get_CFG_Path_DFS(cfg_start_node, cfg_target_node)

输出结果为：

[*] The Entry Node =  (_5:AST {childnum: 0, funcid: 1, id: 5, lineno: 34, type: 'AST_CALL'})
[*] The Target Node =  (_188:AST {childnum: 0, funcid: 1, id: 188, lineno: 72, type: 'AST_METHOD_CALL'})
[34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 41, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 41, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 41, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]
[34, 36, 38, 40, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink flow结果

最后一共找到了8条sink flow，依次如下图所示。

sink flow 1：[34, 36, 38, 40, 41, 41, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-1

sink flow 2：[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-1

sink flow 3：[34, 36, 38, 40, 41, 41, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-2

sink flow 4：[34, 36, 38, 40, 41, 42, 43, 45, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-2

sink flow 5：[34, 36, 38, 40, 41, 41, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-5

sink flow 6：[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-3

sink flow 7：[34, 36, 38, 40, 41, 41, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-8

sink flow 8：[34, 36, 38, 40, 41, 42, 43, 45, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-8

sink flow 9：[34, 36, 38, 40, 48, 49, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-5

sink flow 10：[34, 36, 38, 40, 48, 49, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-6

sink flow 11：[34, 36, 38, 40, 48, 51, 54, 55, 60, 61, 67, 69, 70, 72]

sink-flow-7

sink flow 12：[34, 36, 38, 40, 48, 51, 54, 57, 60, 61, 67, 69, 70, 72]

sink-flow-8

一个小问题

上面找到的sink flow中，其实还存在一些不太完美的地方，就是使用一条数据流边替代了控制流边。即上面id:74 -> id:97的边，实际上的准确控制流的路径应该是id:74 -> id:41 -> id:41 -> id:97，这是因为在遍历的时候存储的是lineno，而不是直接存储的结点，如果是存储的结点，那么其实是可以找到id:74 -> id:97的路径的。不过这其实没有什么特别大的关系，这里只是记录下存在这么个问题。