前言

phpjeorn将php-ast生成的抽象语法树保存在nodes.csvrels.csv中,当joern读取这两个文件,并生成代码属性图的时候,要先将根据这两个文件恢复AST树,然后根据AST树按步骤生成控制流图CFG,然后根据控制流图生成DefUseGraph,再利用迭代算法,根据DefuseGraph和CFG生成数据流图DDG,最后生成Call Graph。因此,将ndoes.csvrels.csv文件恢复为AST树也是关键的一步。本文就记录下这个过程中的一些关键步骤。

在调试跟踪的时候,发现了一些CSV处理类,要搞清楚Joern是如何将CSV文件恢复成AST树的,就要先了解这些CSV处理类。

KeyedCSVRow类

/joern/projects/extensions/jpanlib/src/main/java/inputModules/csv/KeyedCSV/KeyedCSVRow.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public class KeyedCSVRow
{
private CSVKey[] keys;
private Map<CSVKey,String> values = new LinkedHashMap<CSVKey,String>();

public KeyedCSVRow(CSVKey[] keys)
{
this.keys = keys;
}

// 将csv文件中每行的信息以键值对的方式读取
public void initFromCSVRecord(CSVRecord record)
{
int i = 0;
Iterator<String> recIt = record.iterator();
while (recIt.hasNext())
{
String r = recIt.next();
values.put(keys[i], r);
i++;
}
}

// 根据key获取value
public String getFieldForKey(CSVKey key)
{
String val = values.get(key);
return (null == val) ? "" : val;
}

// 字符串化操作
@Override
public String toString()
{
return this.values.toString();
}
}

下图是一个实例化后的KeyedCSVRow对象:

2

对应nodes.csv文件中的第2行信息:

3

PHPCSVNodeTypes类

joern/projects/extensions/joern-php/src/main/java/inputModules/csv/PHPCSVNodeTypes.java

PHPCSVNodeTypes类中定义了不同的AST nodes kind(也就是type)和修饰这些type的flags:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
public class PHPCSVNodeTypes
{
/* node row keys */

public static final CSVKey NODE_ID = new CSVKey("id","int");
// node labels (either Filesystem, AST or Artificial)
public static final CSVKey LABEL = new CSVKey("labels","label");
// node properties shared by all nodes (cf. ast\Node specification
// in {@link https://github.com/nikic/php-ast})
public static final CSVKey TYPE = new CSVKey("type");
public static final CSVKey FLAGS = new CSVKey("flags","string_array");
public static final CSVKey LINENO = new CSVKey("lineno","int");
// node properties for declaration nodes (cf. ast\Node\Decl specification
// in {@link https://github.com/nikic/php-ast}
public static final CSVKey ENDLINENO = new CSVKey("endlineno","int");
public static final CSVKey NAME = new CSVKey("name");
public static final CSVKey DOCCOMMENT = new CSVKey("doccomment");
// meta-properties
public static final CSVKey CODE = new CSVKey("code");
public static final CSVKey CHILDNUM = new CSVKey("childnum","int");
public static final CSVKey FUNCID = new CSVKey("funcid","int");
public static final CSVKey CLASSNAME = new CSVKey("classname");
public static final CSVKey NAMESPACE = new CSVKey("namespace");
//public static final CSVKey FILEID = new CSVKey("fileid", "int");


/* node labels */
public static final String LABEL_FS = "Filesystem";
public static final String LABEL_AST = "AST";
public static final String LABEL_ART = "Artificial";


/* node types */

// directory/file types
public static final String TYPE_FILE = "File";
public static final String TYPE_DIRECTORY = "Directory";

// null nodes (leafs)
// used as dummy child for nodes with a fixed number of children
// that do not need a certain child in a given context, to keep
// the number of their children constant
// (e.g., a function node that does not specify its return type in
// its declaration; see TestPHPCSVASTBuilderMinimal for more examples.)
public static final String TYPE_NULL = "NULL";

// primary expressions (leafs)
public static final String TYPE_INTEGER = "integer";
public static final String TYPE_DOUBLE = "double";

[SNIP ... SNIP]

public static final String TYPE_NAME_LIST = "AST_NAME_LIST";
public static final String TYPE_TRAIT_ADAPTATIONS = "AST_TRAIT_ADAPTATIONS";
public static final String TYPE_USE = "AST_USE";


/* node flags */

// flags for TYPE_ARRAY_ELEM and TYPE_CLOSURE_VAR (exclusive)
public static final String FLAG_BY_REFERENCE = "BY_REFERENCE"; // custom, see phpjoern commit 95cdc6b6de1c4b973775a97b90e8bf41c90f629b

// flags for TYPE_NAME nodes (exclusive)
public static final String FLAG_NAME_FQ = "NAME_FQ";
public static final String FLAG_NAME_NOT_FQ = "NAME_NOT_FQ";
public static final String FLAG_NAME_RELATIVE = "NAME_RELATIVE";

[SNIP ... SNIP]

// flags for TYPE_INCLUDE_OR_EVAL nodes (exclusive)
public static final String FLAG_EXEC_EVAL = "EXEC_EVAL";
public static final String FLAG_EXEC_INCLUDE = "EXEC_INCLUDE";
public static final String FLAG_EXEC_INCLUDE_ONCE = "EXEC_INCLUDE_ONCE";
public static final String FLAG_EXEC_REQUIRE = "EXEC_REQUIRE";
public static final String FLAG_EXEC_REQUIRE_ONCE = "EXEC_REQUIRE_ONCE";
}

PHPCSVEdgeTypes类

joern/projects/extensions/joern-php/src/main/java/inputModules/csv/PHPCSVEdgeTypes.java

PHPCSVNodeTypes类相似,定义了节点与节点之间不同类型的关系边:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public class PHPCSVEdgeTypes
{
/* edge row keys */
public static final CSVKey START_ID = new CSVKey("start");
public static final CSVKey END_ID = new CSVKey("end");
public static final CSVKey TYPE = new CSVKey("type");

/* edge types */
public static final String TYPE_FILE_OF = "FILE_OF";
public static final String TYPE_DIRECTORY_OF = "DIRECTORY_OF";
public static final String TYPE_AST_PARENT_OF = "PARENT_OF";
public static final String TYPE_CFG_ENTRY = "ENTRY";
public static final String TYPE_CFG_EXIT = "EXIT";

}

CSVFunctionExtractor类

joern/projects/extensions/joern-php/src/main/java/inputModules/csv/csvFuncExtractor/CSVFunctionExtractor.java

CSVFunctinoExtractor类主要负责从csv文件中按行读取记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package inputModules.csv.csvFuncExtractor;

import ...

public class CSVFunctionExtractor
{

KeyedCSVReader nodeReader;
KeyedCSVReader edgeReader;
// 一个first in last out的栈
Stack<CSVAST> csvStack = new Stack<CSVAST>();
Stack<String> funcIdStack = new Stack<String>();
// 一个first in first out的队列,分别存储每个文件对应的nodes.csv中的节点和rels.csv中的关系边
Queue<CSVAST> csvFifoQueue = new LinkedList<CSVAST>();
Map<CSVAST,Set<String>> csvNodeIds = new HashMap<CSVAST,Set<String>>();
CSV2AST csv2ast = new PHPCSV2AST();

......

/**
* Returns a function node by reading from the node and edge
* readers and extracting and converting the next function.
*
* @return The next function node, or null if there are none.
*/
public FunctionDefBase getNextFunction()
throws IOException, InvalidCSVFile
{
if( csvFifoQueue.isEmpty()) {

// there are no functions in the queue, let's get some
assert csvStack.empty() : "There are unfinished CSVASTs on the stack and they are not going to be converted.";
// 读取nodes.csv,按照一个一个php文件读取(而不是一个nodes.csv读取,因为只有一个nodes.csv文件)
addNodeRowsUntilNextFile();
// 读取rels.csv,按照一个一个php文件读取
addEdgeRowsUntilNextFile();
}

FunctionDefBase function = null;

if( !csvFifoQueue.isEmpty()) {

CSVAST csvAST = csvFifoQueue.remove();
// csv2ast.convert()会将CSVAST转为抽象语法树
function = csv2ast.convert(csvAST);
}

return function;
}

......
}

1

跟入addNodeRowsUntilNextFile()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
public class CSVFunctionExtractor
{
/**
* 该函数会通过nodeReader一行一行地从nodes.csv中,并且是一个一个文件地读取
* This function reads lines from the nodeReader, file by file.
*
* 1. It first continuously adds lines that have the same funcid as the
* funcid currently on top of the funcIdStack to the CSVAST on top of
* the csvStack.
* 2. Upon finding a function declaration:
* [<outdated>
* a) It adds the line to the CSVAST on top of the csvStack, since the
* declaration as such is indeed part of the outer scope and belongs there.
* </outdated>]
* UPDATE: 2a) does not hold anymore! We now only add function declarations
* once, as root node of the CSVAST that corresponds to this function itself.
* b) It pushes a new CSVAST on top of the csvStack and that function's id
* on top of the funcIdStack. The line is also added to this new CSVAST.
* [<outdated>
* Do note that this means that we intentionally duplicate a line by adding it
* to two separate CSVAST instances. This second addition is needed for
* technical reasons, because the line contains meta-information about the
* function (e.g., its name) that we will need when converting the CSVAST to an
* ast.functionDef.FunctionDef node using the CSV2AST class.
* </outdated>]
* UPDATE: So now, this line is not duplicated anymore. The line is *only*
* added to new new CSVAST.
* 3. Upon finding a funcId different from the one on top of the funcIdStack,
* it looks for that funcId within the stack. In a valid CSV file, this
* funcId must have been previously declared by a function declaration.
* a) If it is not found, an exception is thrown.
* b) If it is found, we know that we finished scanning at least one function (since
* we got back to the "outer" scope). The distance from the top of the stack to
* the csvAST that corresponds to the current funcId is the number of functions
* that we finished scanning. We pop the csvStack (and the funcIdStack) that many
* times and put the popped CSVAST's in the csvFifoQueue.
*/
private void addNodeRowsUntilNextFile() throws InvalidCSVFile
{
while( nodeReader.hasNextRow())
{

// KeyedCSVRow对象会将nodes.csv文件中每行的信息以键值对的方式整理起来
KeyedCSVRow currNodeRow = nodeReader.getNextRow();
//System.out.println(currNodeRow);
// 获取当前行所对应节点的类型,如Directory, File, AST_CALL等等
String currType = currNodeRow.getFieldForKey(PHPCSVNodeTypes.TYPE);

// ignore dir nodes
// 如果遇到的节点是Directory节点,那先跳过
if( currType.equals(PHPCSVNodeTypes.TYPE_DIRECTORY))
continue;

// If we met a file node and we finished some new functions, break.
// There should always be some new functions except at the very beginning
// when this function is called for the first time.
if( currType.equals(PHPCSVNodeTypes.TYPE_FILE)) {
if( !csvStack.isEmpty())
break;
else
continue;
}

// ignore artificial CFG entry and exit nodes; they will be generated
// by the CFG factory and their node ids will be computed using a fixed offset
// from the id of the considered function
if( currType.equals(PHPCSVNodeTypes.TYPE_CFG_ENTRY) || currType.equals(PHPCSVNodeTypes.TYPE_CFG_EXIT))
continue;

// if we met a toplevel node of a file, then make sure the csvStack is
// empty, put a new CSVAST on the stack, add current row, and continue
if( currType.equals(PHPCSVNodeTypes.TYPE_TOPLEVEL) &&
currNodeRow.getFieldForKey(PHPCSVNodeTypes.FLAGS).contains(PHPCSVNodeTypes.FLAG_TOPLEVEL_FILE)) {

// make sure stack is empty
if( !csvStack.empty())
throw new InvalidCSVFile( "nodeReader, line " + nodeReader.getCurrentLineNumber() + ": "
+ " A toplevel node of a file was found when the toplevel function"
+ " of the previous file was not finished scanning.");

// create a new top-level function at the bottom of the stack and add current row
String topLevelFuncId = currNodeRow.getFieldForKey(PHPCSVNodeTypes.NODE_ID);
initCSVAST(topLevelFuncId);
addRowToTopCSVAST(currNodeRow);

continue;
}

// we are looking neither at a dir node, file node, nor toplevel node of a file
// make sure stack is not empty at this point
if( csvStack.empty())
throw new InvalidCSVFile( "nodeReader, line " + nodeReader.getCurrentLineNumber() + ": "
+ "No toplevel node of a file to initialize top-level code was found.");

String currFuncId = currNodeRow.getFieldForKey(PHPCSVNodeTypes.FUNCID);

// how many functions did we just finish?
// (0 = currFuncId corresponds to CSVAST currently on top of stack)
int finishedFunctions = funcIdStack.search(currFuncId) - 1;
// if currFuncId is not in the stack, fail; this should never happen with a valid CSV file
if( finishedFunctions < 0)
throw new InvalidCSVFile( "nodeReader, line " + nodeReader.getCurrentLineNumber() + ": "
+ "funcid " + currFuncId + " has never been initialized by a function declaration.");
// put finished functions into the finished functions queue
for( int i = 0; i < finishedFunctions; i++) {
csvFifoQueue.add( csvStack.pop());
funcIdStack.pop();
}
// put the current line on the correct CSVAST's
addRowAndInitASTForFuncType(currNodeRow);
}

// If we are here, it means one of two things:
// - We broke out of the loop because we finished scanning a file;
// - The nodeReader does not have any more rows to read.
// In both cases, we just push the remaining (now finished) functions
// on the csvStack onto the csvFifoQueue.
while( !csvStack.empty()) {
csvFifoQueue.add( csvStack.pop());
funcIdStack.pop();
}
}
}

ASTUnderConstruction类

joern/projects/extensions/jpanlib/src/main/java/inputModules/csv/csv2ast/ASTUnderConstruction.java

ASTUnderConstruction类是生成AST的辅助类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
package inputModules.csv.csv2ast;

import java.util.Arrays;
import java.util.HashMap;

import ast.ASTNode;
import ast.functionDef.FunctionDefBase;

public class ASTUnderConstruction
{
HashMap<Long, ASTNode> idToNode = new HashMap<Long, ASTNode>();
// AST树的根节点
FunctionDefBase rootNode;

/**
* @return The AST's root node, or null if none is set.
* @see setRootNode(FunctionDef)
* 返回一棵AST树的根节点
*/
public FunctionDefBase getRootNode()
{
return rootNode;
}

/**
* For ASTUnderConstruction instances representing single functions,
* the function's root node may be explicitly set.
*
* @param node The node to be considered as the AST's root node.
* 设置AST树的根节点
*/
public void setRootNode(FunctionDefBase node)
{
rootNode = node;
}

// TODO:
// - Make ASTUnderConstruction implement Map.
// - Accordingly, rename addNodeWithId() to put() and getNodeById() to get();
// this makes the class more familiar to use for Java programmers anyhow.
// - Throw an exception if trying to put() a Node that already exists
// in the map but with a different id.
// - The previous point makes the map bijective. Implement a method getIdForNode()
// that gives us the unique id of a given node, or -1 if it is not contained.
public void addNodeWithId(ASTNode newNode, Long id)
{
idToNode.put(id, newNode);
}

// 根据id返回ASTNode
public ASTNode getNodeById(Long id)
{
return idToNode.get(id);
}

public ASTNode getNodeWithLowestId()
{
Object[] array = idToNode.keySet().toArray();
Arrays.sort(array);
return idToNode.get(array[0]);
}

public boolean containsValue(ASTNode node) {
return idToNode.containsValue(node);
}
}

CSV2AST类

joern/projects/extensions/jpanlib/src/main/java/inputModules/csv/csv2ast/CSV2AST.java

joern/projects/extensions/joern-php/src/main/java/inputModules/csv/csv2ast/PHPCSV2AST.java

convert(CSVAST csvAST)

1
2
3
4
5
6
7
8
9
10
11
12
13
public FunctionDefBase convert(CSVAST csvAST)
throws IOException, InvalidCSVFile
{
ASTUnderConstruction ast = new ASTUnderConstruction();

// 根据csvAST.nodeRows将每行的节点信息转成ASTNode保存在ast参数中
createASTNodes(csvAST, ast);
// 根据csvAST.edgeRows将节点与节点之间的关系保存在ast中
createASTEdges(csvAST, ast);

// 返回AST树的根节点
return ast.getRootNode();
}

createASTNodes

createASTNodes()函数会生成ASTNode类型的节点:

1
2
3
4
5
6
7
8
9
protected void createASTNodes(CSVAST csvAST, ASTUnderConstruction ast) throws InvalidCSVFile
{
Iterator<KeyedCSVRow> nodeRows = csvAST.nodeIterator();
// 拿出nodeRows中的第一条记录保存为keyedRow
KeyedCSVRow keyedRow = getFirstKeyedRow(nodeRows);
// 生成ASTNode
// 单独传入keyedRow是为了单独处理根节点
createASTForFunction(ast, nodeRows, keyedRow);
}

该方法在joern/projects/extensions/joern-php/src/main/java/inputModules/csv/csv2ast/PHPCSV2AST.javaPHPCSV2AST类中被重写,对nodeRows中的第一条记录增加了要求,必须是function type:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class PHPCSV2AST extends CSV2AST {

@Override
protected void createASTNodes(CSVAST csvAST, ASTUnderConstruction ast) throws InvalidCSVFile
{
Iterator<KeyedCSVRow> nodeRows = csvAST.nodeIterator();
KeyedCSVRow keyedRow = getFirstKeyedRow(nodeRows);

// first row must be a function type;
// otherwise we cannot create a function node
// 这里加了一个对nodeRows中第一条记录的判断,第一条记录必须是function type
if( !PHPCSVNodeTypes.funcTypes.contains(keyedRow.getFieldForKey(PHPCSVNodeTypes.TYPE)))
throw new InvalidCSVFile( "Type of first row is not a function declaration.");

createASTForFunction(ast, nodeRows, keyedRow);
}
}

PHPCSVNodeTypes.funcTypes是比较广义的类型,有:

1
2
3
4
5
public class PHPCSVNodeTypes
{
public static final List<String> funcTypes =
Arrays.asList(TYPE_TOPLEVEL, TYPE_FUNC_DECL, TYPE_METHOD, TYPE_CLOSURE);
}

createASTForFunction

createASTForFunction()函数会以function type节点对应的语句范围为单位,生成ASTNode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
protected void createASTForFunction(ASTUnderConstruction ast, Iterator<KeyedCSVRow> nodeRows, KeyedCSVRow keyedRow)
throws InvalidCSVFile
{
// 先设置根节点
// nodeInterpreter.handle(keyedRow, ast)函数会读取keyedRow,将其转成ASTNode,并存储在ast参数中
// 然后通过ast》getNodeById 返回一个id值
FunctionDefBase root = (FunctionDefBase) ast.getNodeById( nodeInterpreter.handle(keyedRow, ast));
// 将其设置为根节点
ast.setRootNode(root);

while (nodeRows.hasNext())
{
keyedRow = nodeRows.next();
// 继续读取后续的keyedRow,将其转成ASTNode,并存储在ast参数中
nodeInterpreter.handle(keyedRow, ast);
}
}

如下图是一个ast对象的简单例子:

4

从上图的例子也可以看到,已经有了ASTNode,但是它们之前仅仅是通过createASTForFunction函数处理还没有节点与节点之间的关系。

createASTEdges

1
2
3
4
5
6
7
8
9
10
11
12
private void createASTEdges(CSVAST csvAST, ASTUnderConstruction ast) throws InvalidCSVFile
{
Iterator<KeyedCSVRow> edgeRows = csvAST.edgeIterator();
KeyedCSVRow keyedRow;

while (edgeRows.hasNext())
{
keyedRow = edgeRows.next();
// 根据边之间的关系,通过设置ast中ASTNode节点的children对象作为节点与节点之间的关系边
edgeInterpreter.handle(keyedRow, ast);
}
}

setInterpreters

设置解释器:

1
2
3
4
5
public void setInterpreters(CSVRowInterpreter nodeInterpreter, CSVRowInterpreter edgeInterpreter)
{
this.nodeInterpreter = nodeInterpreter;
this.edgeInterpreter = edgeInterpreter;
}

PHPCSVNodeInterpreter类

joern/projects/extensions/joern-php/src/main/java/tools/php/ast2cpg/PHPCSVNodeInterpreter.java

joern/projects/extensions/jpanlib/src/main/java/inputModules/csv/csv2ast/CSVRowInterpreter.java

PHPCSVNodeInterpreter通过handle(KeyedCSVRow row, ASTUnderConstruction ast)方法,将一行node记录转为AST,并保存在ast中。

PHPCSVNodeInterpreter类实现了接口CSVRowInterpreter,后者只有一个接口方法:

1
2
3
4
public interface CSVRowInterpreter
{
public long handle(KeyedCSVRow row, ASTUnderConstruction ast) throws InvalidCSVFile;
}

PHPCSVNodeInterpreter中实现了handle方法,先从参数KeyedCSVRow row中获取节点type,然后针对不同的type,有不同的处理方方式。该类中定义了很多handle****()函数,就是来处理不同type的节点,这些函数会返回一个节点对应的id,该id也是在row中定义好的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
public class PHPCSVNodeInterpreter implements CSVRowInterpreter
{

@Override
public long handle(KeyedCSVRow row, ASTUnderConstruction ast)
throws InvalidCSVFile
{
long retval = -1;
// 获取当前节点的type,不同type的节点有不同的处理方式
String type = row.getFieldForKey(PHPCSVNodeTypes.TYPE);
switch (type)
{
// null nodes (leafs)
case PHPCSVNodeTypes.TYPE_NULL:
retval = handleNull(row, ast);
break;

// primary expressions (leafs)
case PHPCSVNodeTypes.TYPE_INTEGER:
retval = handleInteger(row, ast);
break;
case PHPCSVNodeTypes.TYPE_DOUBLE:
retval = handleDouble(row, ast);
break;
case PHPCSVNodeTypes.TYPE_STRING:
retval = handleString(row, ast);
break;

// special nodes
case PHPCSVNodeTypes.TYPE_NAME:
retval = handleName(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CLOSURE_VAR:
retval = handleClosureVar(row, ast);
break;

// declaration nodes
case PHPCSVNodeTypes.TYPE_TOPLEVEL:
retval = handleTopLevelFunction(row, ast);
break;
case PHPCSVNodeTypes.TYPE_FUNC_DECL:
retval = handleFunction(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CLOSURE:
retval = handleClosure(row, ast);
break;
case PHPCSVNodeTypes.TYPE_METHOD:
retval = handleMethod(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CLASS:
retval = handleClass(row, ast);
break;

// nodes without children (leafs)
// expressions
case PHPCSVNodeTypes.TYPE_MAGIC_CONST:
retval = handleMagicConst(row, ast);
break;
case PHPCSVNodeTypes.TYPE_TYPE:
retval = handleTypeHint(row, ast);
break;

// nodes with exactly 1 child
// expressions
case PHPCSVNodeTypes.TYPE_VAR:
retval = handleVariable(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CONST:
retval = handleConstant(row, ast);
break;
case PHPCSVNodeTypes.TYPE_UNPACK:
retval = handleUnpack(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CAST:
retval = handleCast(row, ast);
break;
case PHPCSVNodeTypes.TYPE_EMPTY:
retval = handleEmpty(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ISSET:
retval = handleIsset(row, ast);
break;
case PHPCSVNodeTypes.TYPE_SHELL_EXEC:
retval = handleShellExec(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CLONE:
retval = handleClone(row, ast);
break;
case PHPCSVNodeTypes.TYPE_EXIT:
retval = handleExit(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PRINT:
retval = handlePrint(row, ast);
break;
case PHPCSVNodeTypes.TYPE_INCLUDE_OR_EVAL:
retval = handleIncludeOrEval(row, ast);
break;
case PHPCSVNodeTypes.TYPE_UNARY_OP:
retval = handleUnaryOperation(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PRE_INC:
retval = handlePreInc(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PRE_DEC:
retval = handlePreDec(row, ast);
break;
case PHPCSVNodeTypes.TYPE_POST_INC:
retval = handlePostInc(row, ast);
break;
case PHPCSVNodeTypes.TYPE_POST_DEC:
retval = handlePostDec(row, ast);
break;
case PHPCSVNodeTypes.TYPE_YIELD_FROM:
retval = handleYieldFrom(row, ast);
break;

// statements
case PHPCSVNodeTypes.TYPE_GLOBAL:
retval = handleGlobal(row, ast);
break;
case PHPCSVNodeTypes.TYPE_UNSET:
retval = handleUnset(row, ast);
break;
case PHPCSVNodeTypes.TYPE_RETURN:
retval = handleReturn(row, ast);
break;
case PHPCSVNodeTypes.TYPE_LABEL:
retval = handleLabel(row, ast);
break;
case PHPCSVNodeTypes.TYPE_REF:
retval = handleReference(row, ast);
break;
case PHPCSVNodeTypes.TYPE_HALT_COMPILER:
retval = handleHaltCompiler(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ECHO:
retval = handleEcho(row, ast);
break;
case PHPCSVNodeTypes.TYPE_THROW:
retval = handleThrow(row, ast);
break;
case PHPCSVNodeTypes.TYPE_GOTO:
retval = handleGoto(row, ast);
break;
case PHPCSVNodeTypes.TYPE_BREAK:
retval = handleBreak(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CONTINUE:
retval = handleContinue(row, ast);
break;

// nodes with exactly 2 children
// expressions
case PHPCSVNodeTypes.TYPE_DIM:
retval = handleArrayIndexing(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PROP:
retval = handleProperty(row, ast);
break;
case PHPCSVNodeTypes.TYPE_STATIC_PROP:
retval = handleStaticProperty(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CALL:
retval = handleCall(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CLASS_CONST:
retval = handleClassConstant(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ASSIGN:
retval = handleAssign(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ASSIGN_REF:
retval = handleAssignByRef(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ASSIGN_OP:
retval = handleAssignWithOp(row, ast);
break;
case PHPCSVNodeTypes.TYPE_BINARY_OP:
retval = handleBinaryOperation(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ARRAY_ELEM:
retval = handleArrayElement(row, ast);
break;
case PHPCSVNodeTypes.TYPE_NEW:
retval = handleNew(row, ast);
break;
case PHPCSVNodeTypes.TYPE_INSTANCEOF:
retval = handleInstanceof(row, ast);
break;
case PHPCSVNodeTypes.TYPE_YIELD:
retval = handleYield(row, ast);
break;
case PHPCSVNodeTypes.TYPE_COALESCE:
retval = handleCoalesce(row, ast);
break;

// statements
case PHPCSVNodeTypes.TYPE_STATIC:
retval = handleStaticVariable(row, ast);
break;
case PHPCSVNodeTypes.TYPE_WHILE:
retval = handleWhile(row, ast);
break;
case PHPCSVNodeTypes.TYPE_DO_WHILE:
retval = handleDo(row, ast);
break;
case PHPCSVNodeTypes.TYPE_IF_ELEM:
retval = handleIfElement(row, ast);
break;
case PHPCSVNodeTypes.TYPE_SWITCH:
retval = handleSwitch(row, ast);
break;
case PHPCSVNodeTypes.TYPE_SWITCH_CASE:
retval = handleSwitchCase(row, ast);
break;
case PHPCSVNodeTypes.TYPE_DECLARE:
retval = handleDeclare(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PROP_ELEM:
retval = handlePropertyElement(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CONST_ELEM:
retval = handleConstantElement(row, ast);
break;
case PHPCSVNodeTypes.TYPE_USE_TRAIT:
retval = handleUseTrait(row, ast);
break;
case PHPCSVNodeTypes.TYPE_TRAIT_PRECEDENCE:
retval = handleTraitPrecedence(row, ast);
break;
case PHPCSVNodeTypes.TYPE_METHOD_REFERENCE:
retval = handleMethodReference(row, ast);
break;
case PHPCSVNodeTypes.TYPE_NAMESPACE:
retval = handleNamespace(row, ast);
break;
case PHPCSVNodeTypes.TYPE_USE_ELEM:
retval = handleUseElement(row, ast);
break;
case PHPCSVNodeTypes.TYPE_TRAIT_ALIAS:
retval = handleTraitAlias(row, ast);
break;
case PHPCSVNodeTypes.TYPE_GROUP_USE:
retval = handleGroupUse(row, ast);
break;

// nodes with exactly 3 children
// expressions
case PHPCSVNodeTypes.TYPE_METHOD_CALL:
retval = handleMethodCall(row, ast);
break;
case PHPCSVNodeTypes.TYPE_STATIC_CALL:
retval = handleStaticCall(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CONDITIONAL:
retval = handleConditional(row, ast);
break;

// statements
case PHPCSVNodeTypes.TYPE_TRY:
retval = handleTry(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CATCH:
retval = handleCatch(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PARAM:
retval = handleParameter(row, ast);
break;

// nodes with exactly 4 children
// statements
case PHPCSVNodeTypes.TYPE_FOR:
retval = handleFor(row, ast);
break;
case PHPCSVNodeTypes.TYPE_FOREACH:
retval = handleForEach(row, ast);
break;

// nodes with an arbitrary number of children
case PHPCSVNodeTypes.TYPE_ARG_LIST:
retval = handleArgumentList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_LIST:
retval = handleList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ARRAY:
retval = handleArray(row, ast);
break;
case PHPCSVNodeTypes.TYPE_ENCAPS_LIST:
retval = handleEncapsList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_EXPR_LIST:
retval = handleExpressionList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_STMT_LIST:
retval = handleCompound(row, ast);
break;
case PHPCSVNodeTypes.TYPE_IF:
retval = handleIf(row, ast);
break;
case PHPCSVNodeTypes.TYPE_SWITCH_LIST:
retval = handleSwitchList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CATCH_LIST:
retval = handleCatchList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PARAM_LIST:
retval = handleParameterList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CLOSURE_USES:
retval = handleClosureUses(row, ast);
break;
case PHPCSVNodeTypes.TYPE_PROP_DECL:
retval = handlePropertyDeclaration(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CONST_DECL:
retval = handleConstantDeclaration(row, ast);
break;
case PHPCSVNodeTypes.TYPE_CLASS_CONST_DECL:
retval = handleClassConstantDeclaration(row, ast);
break;
case PHPCSVNodeTypes.TYPE_NAME_LIST:
retval = handleIdentifierList(row, ast);
break;
case PHPCSVNodeTypes.TYPE_TRAIT_ADAPTATIONS:
retval = handleTraitAdaptations(row, ast);
break;
case PHPCSVNodeTypes.TYPE_USE:
retval = handleUseStatement(row, ast);
break;

default:
retval = defaultHandler(row, ast);
}

return retval;
}
}

handleTopLevelFunction为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
private static long handleTopLevelFunction(KeyedCSVRow row,
ASTUnderConstruction ast) throws InvalidCSVFile
{
// 实例化一个TopLevelFunctionDef对象,TopLevelFunctionDef继承自ASTNode
TopLevelFunctionDef newNode = new TopLevelFunctionDef();

// 从row中读取type,flags,lineno,childnum等信息
String type = row.getFieldForKey(PHPCSVNodeTypes.TYPE);
String flags = row.getFieldForKey(PHPCSVNodeTypes.FLAGS);
String lineno = row.getFieldForKey(PHPCSVNodeTypes.LINENO);
String childnum = row.getFieldForKey(PHPCSVNodeTypes.CHILDNUM);
String endlineno = row.getFieldForKey(PHPCSVNodeTypes.ENDLINENO);
String name = row.getFieldForKey(PHPCSVNodeTypes.NAME);

// 将type,flags,lineno,childnum等信息保存到TopLevelFunctionDef对象中
newNode.setProperty(PHPCSVNodeTypes.TYPE.getName(), type);
newNode.setFlags(flags);
CodeLocation codeloc = new CodeLocation();
codeloc.startLine = Integer.parseInt(lineno);
codeloc.endLine = Integer.parseInt(endlineno);

newNode.setLocation(codeloc);
newNode.setProperty(PHPCSVNodeTypes.CHILDNUM.getName(), childnum);
if (flags.contains(PHPCSVNodeTypes.FLAG_TOPLEVEL_FILE))
newNode.setName("<" + name + ">");
else if (flags.contains(PHPCSVNodeTypes.FLAG_TOPLEVEL_CLASS))
newNode.setName("[" + name + "]");
else
throw new InvalidCSVFile("While trying to handle row "
+ row.toString() + ": " + "Invalid toplevel flags " + flags
+ ".");

long id = Long.parseLong(row.getFieldForKey(PHPCSVNodeTypes.NODE_ID));
// 将TopLevelFunctionDef对象保存到ast中
ast.addNodeWithId(newNode, id);
// 将id保存到TopLevelFunctionDef对象中
newNode.setNodeId(id);

// 最后的返回值是节点id
return id;
}

另外几个比较特殊的方法是函数调用相关的节点,有4个:

handleCall (KeyedCSVRow row, ASTUnderConstruction ast)

handleNew (KeyedCSVRow row, ASTUnderConstruction ast)

handleMethodCall (KeyedCSVRow row, ASTUnderConstruction ast)

handleStaticCall (KeyedCSVRow row, ASTUnderConstruction ast)

handleCall为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
private long handleCall(KeyedCSVRow row, ASTUnderConstruction ast)
{
// 实例化一个CallExpressionBase对象newNode,CallExpressionBase继承自ASTNode
CallExpressionBase newNode = new CallExpressionBase();

// 获取type,flags,lineno等信息
String type = row.getFieldForKey(PHPCSVNodeTypes.TYPE);
String flags = row.getFieldForKey(PHPCSVNodeTypes.FLAGS);
String lineno = row.getFieldForKey(PHPCSVNodeTypes.LINENO);
String childnum = row.getFieldForKey(PHPCSVNodeTypes.CHILDNUM);

// 将type,flags,lineno等信息保存到对象newNode中
newNode.setProperty(PHPCSVNodeTypes.TYPE.getName(), type);
newNode.setFlags(flags);
CodeLocation codeloc = new CodeLocation();
codeloc.startLine = Integer.parseInt(lineno);

newNode.setLocation(codeloc);
newNode.setProperty(PHPCSVNodeTypes.CHILDNUM.getName(), childnum);

long id = Long.parseLong(row.getFieldForKey(PHPCSVNodeTypes.NODE_ID));
// 将newNode保存到ast中
ast.addNodeWithId(newNode, id);
// 为newNode设置id
newNode.setNodeId(id);

// special in the case of function calls:
// we add the created node to the PHP call graph factory's list of function calls;
// hence we get a list of all function calls without any additional traversals of
// the final AST
// 相比其他类型的节点,这里比较特殊,将newNode加入到PHPCGFactory的function call对应的LinkedList中
PHPCGFactory.addFunctionCall(newNode);

return id;
}

相比其他方法,多了PHPCGFactory.addFunctionCall(newNode);操作。

PHPCGFactory中维护了四个LinkedList数组,用来保存四种不同的CallExpression节点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
public class PHPCGFactory {
// maintains a list of function calls
private static LinkedList<CallExpressionBase> functionCalls = new LinkedList<CallExpressionBase>();
// maintains a list of static method calls
private static LinkedList<StaticCallExpression> staticMethodCalls = new LinkedList<StaticCallExpression>();
// maintains a list of static method calls
private static LinkedList<NewExpression> constructorCalls = new LinkedList<NewExpression>();
// maintains a list of non-static method calls
private static LinkedList<MethodCallExpression> nonStaticMethodCalls = new LinkedList<MethodCallExpression>();

/**
* Adds a new function call.
*
* @param functionCall A PHP function/method/constructor call. An arbitrary number of
* distinguished calls to the same function/method/constructor can
* be added.
*/
public static boolean addFunctionCall( CallExpressionBase callExpression) {

// Note: we cannot access any of the CallExpression's getter methods here
// because this method is called from the PHPCSVNodeInterpreter at the point
// where it constructs the CallExpression. That is, this method is called for each
// CallExpression *immediately* after its construction. At that point, the PHPCSVNodeInterpreter
// has not called the CallExpression's setter methods (as it has not yet interpreted the
// corresponding CSV lines).
// Hence, we only store the references to the CallExpression objects themselves.

if( callExpression instanceof StaticCallExpression)
return staticMethodCalls.add( (StaticCallExpression)callExpression);
else if( callExpression instanceof NewExpression)
return constructorCalls.add( (NewExpression)callExpression);
else if( callExpression instanceof MethodCallExpression)
return nonStaticMethodCalls.add( (MethodCallExpression)callExpression);
else
return functionCalls.add( callExpression);
}

PHPCGFactory.addFunctionCall()方法是在PHPCSVNodeInterpreter中构造CallExpression的时候调用的,也就是说,在每个CallExpression构造完之后,就会调用这个方法,先保存该对象的引用。

PHPCSVEdgeInterpreter类

joern/projects/extensions/joern-php/src/main/java/tools/php/ast2cpg/PHPCSVEdgeInterpreter.java

PHPCSVEdgeInterpreter通过handle(KeyedCSVRow row, ASTUnderConstruction ast)方法,将一行rels记录转为AST,并保存在ast中。

PHPCSVNodeInterpreter一样,PHPCSVEdgeInterpreter类中也实现了handle方法,先从参数KeyedCSVRow row中获取节点type,然后针对不同的type,有不同的处理方方式。同样的,该类中也定义了很多handle****()函数来处理不同type的节点,这些函数会返回一个errno来判断有没有发生错误。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
public class PHPCSVEdgeInterpreter implements CSVRowInterpreter
{

@Override
public long handle(KeyedCSVRow row, ASTUnderConstruction ast)
throws InvalidCSVFile
{
long startId = Long.parseLong(row.getFieldForKey(PHPCSVEdgeTypes.START_ID));
long endId = Long.parseLong(row.getFieldForKey(PHPCSVEdgeTypes.END_ID));

ASTNode startNode = ast.getNodeById(startId);
ASTNode endNode = ast.getNodeById(endId);

// TODO put childnum property into edges file instead of nodes file,
// then do not add the childnum property to ASTNodes in node interpreter any longer,
// then introduce some NumberFormatException handling here.
//int childnum = Integer.parseInt(row.getFieldForKey(PHPCSVEdgeTypes.CHILDNUM));
int childnum = Integer.parseInt(endNode.getProperty(PHPCSVNodeTypes.CHILDNUM.getName()));

// Special treatment for closures: they are expressions, so we create a ClosureExpression to hold them
// We cannot do this in the PHPCSVNodeInterpreter, since CSV2AST expects an instance of PHPFunctionDef
// for the first row of the CSVAST that it converts. (Closure is an instance of PHPFunctionDef and thus
// cannot be an instance of Expression.)
// Closure表示匿名类
// 对匿名类进行特殊处理
if( endNode instanceof Closure)
endNode = createClosureExpression((Closure)endNode);

// errno = 0表示正常
// errno = 1表示childnum对不上
// errno = 2表示在加入leaf子节点的时候发生了问题
int errno = 0;
String type = startNode.getProperty(PHPCSVNodeTypes.TYPE.getName());
switch (type)
{
// 这些节点不存在leaf子节点,所以
// - null nodes (leafs)
// - primary expressions (leafs)
case PHPCSVNodeTypes.TYPE_NULL:
case PHPCSVNodeTypes.TYPE_INTEGER:
case PHPCSVNodeTypes.TYPE_DOUBLE:
case PHPCSVNodeTypes.TYPE_STRING:
errno = 2;
break;

// special nodes
case PHPCSVNodeTypes.TYPE_NAME:
errno = handleName((Identifier)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CLOSURE_VAR:
errno = handleClosureVar((ClosureVar)startNode, endNode, childnum);
break;

// declaration nodes
case PHPCSVNodeTypes.TYPE_TOPLEVEL:
errno = handleTopLevelFunction((TopLevelFunctionDef)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_FUNC_DECL:
errno = handleFunction((FunctionDef)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CLOSURE:
errno = handleClosure((Closure)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_METHOD:
errno = handleMethod((Method)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CLASS:
errno = handleClass((ClassDef)startNode, endNode, childnum);
break;

// nodes without children (leafs)
// expressions
case PHPCSVNodeTypes.TYPE_MAGIC_CONST:
case PHPCSVNodeTypes.TYPE_TYPE:
errno = 2;
break;

// nodes with exactly 1 child
// expressions
case PHPCSVNodeTypes.TYPE_VAR:
errno = handleVariable((Variable)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CONST:
errno = handleConstant((Constant)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_UNPACK:
errno = handleUnpack((UnpackExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CAST:
errno = handleCast((CastExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_EMPTY:
errno = handleEmpty((EmptyExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ISSET:
errno = handleIsset((IssetExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_SHELL_EXEC:
errno = handleShellExec((ShellExecExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CLONE:
errno = handleClone((CloneExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_EXIT:
errno = handleExit((ExitExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PRINT:
errno = handlePrint((PrintExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_INCLUDE_OR_EVAL:
errno = handleIncludeOrEval((IncludeOrEvalExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_UNARY_OP:
errno = handleUnaryOperation((UnaryOperationExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PRE_INC:
errno = handlePreInc((PreIncOperationExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PRE_DEC:
errno = handlePreDec((PreDecOperationExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_POST_INC:
errno = handlePostInc((PostIncOperationExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_POST_DEC:
errno = handlePostDec((PostDecOperationExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_YIELD_FROM:
errno = handleYieldFrom((YieldFromExpression)startNode, endNode, childnum);
break;

// statements
case PHPCSVNodeTypes.TYPE_GLOBAL:
errno = handleGlobal((GlobalStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_UNSET:
errno = handleUnset((UnsetStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_RETURN:
errno = handleReturn((ReturnStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_LABEL:
errno = handleLabel((Label)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_REF:
errno = handleReference((ReferenceExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_HALT_COMPILER:
errno = handleHaltCompiler((HaltCompilerStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ECHO:
errno = handleEcho((EchoStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_THROW:
errno = handleThrow((ThrowStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_GOTO:
errno = handleGoto((GotoStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_BREAK:
errno = handleBreak((BreakStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CONTINUE:
errno = handleContinue((ContinueStatement)startNode, endNode, childnum);
break;

// nodes with exactly 2 children
// expressions
case PHPCSVNodeTypes.TYPE_DIM:
errno = handleArrayIndexing((ArrayIndexing)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PROP:
errno = handleProperty((PropertyExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_STATIC_PROP:
errno = handleStaticProperty((StaticPropertyExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CALL:
errno = handleCall((CallExpressionBase)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CLASS_CONST:
errno = handleClassConstant((ClassConstantExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ASSIGN:
errno = handleAssign((AssignmentExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ASSIGN_REF:
errno = handleAssignByRef((AssignmentByRefExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ASSIGN_OP:
errno = handleAssignWithOp((AssignmentWithOpExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_BINARY_OP:
errno = handleBinaryOperation((BinaryOperationExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ARRAY_ELEM:
errno = handleArrayElement((ArrayElement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_NEW:
errno = handleNew((NewExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_INSTANCEOF:
errno = handleInstanceof((InstanceofExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_YIELD:
errno = handleYield((YieldExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_COALESCE:
errno = handleCoalesce((CoalesceExpression)startNode, endNode, childnum);
break;

// statements
case PHPCSVNodeTypes.TYPE_STATIC:
errno = handleStaticVariable((StaticVariableDeclaration)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_WHILE:
errno = handleWhile((WhileStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_DO_WHILE:
errno = handleDo((DoStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_IF_ELEM:
errno = handleIfElement((IfElement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_SWITCH:
errno = handleSwitch((SwitchStatementPHP)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_SWITCH_CASE:
errno = handleSwitchCase((SwitchCase)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PROP_ELEM:
errno = handlePropertyElement((PropertyElement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_DECLARE:
errno = handleDeclare((DeclareStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CONST_ELEM:
errno = handleConstantElement((ConstantElement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_USE_TRAIT:
errno = handleUseTrait((UseTrait)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_TRAIT_PRECEDENCE:
errno = handleTraitPrecedence((TraitPrecedence)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_METHOD_REFERENCE:
errno = handleMethodReference((MethodReference)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_NAMESPACE:
errno = handleNamespace((NamespaceStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_USE_ELEM:
errno = handleUseElement((UseElement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_TRAIT_ALIAS:
errno = handleTraitAlias((TraitAlias)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_GROUP_USE:
errno = handleGroupUse((GroupUseStatement)startNode, endNode, childnum);
break;

// nodes with exactly 3 children
// expressions
case PHPCSVNodeTypes.TYPE_METHOD_CALL:
errno = handleMethodCall((MethodCallExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_STATIC_CALL:
errno = handleStaticCall((StaticCallExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CONDITIONAL:
errno = handleConditional((ConditionalExpression)startNode, endNode, childnum);
break;

// statements
case PHPCSVNodeTypes.TYPE_TRY:
errno = handleTry((TryStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CATCH:
errno = handleCatch((CatchStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PARAM:
errno = handleParameter((Parameter)startNode, endNode, childnum);
break;

// nodes with exactly 4 children
// statements
case PHPCSVNodeTypes.TYPE_FOR:
errno = handleFor((ForStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_FOREACH:
errno = handleForEach((ForEachStatement)startNode, endNode, childnum);
break;

// nodes with an arbitrary number of children
case PHPCSVNodeTypes.TYPE_ARG_LIST:
errno = handleArgumentList((ArgumentList)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_LIST:
errno = handleList((ListExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ARRAY:
errno = handleArray((ArrayExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_ENCAPS_LIST:
errno = handleEncapsList((EncapsListExpression)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_EXPR_LIST:
errno = handleExpressionList((ExpressionList)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_STMT_LIST:
errno = handleCompound((CompoundStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_IF:
errno = handleIf((IfStatement)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_SWITCH_LIST:
errno = handleSwitchList((SwitchList)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CATCH_LIST:
errno = handleCatchList((CatchList)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PARAM_LIST:
errno = handleParameterList((ParameterList)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CLOSURE_USES:
errno = handleClosureUses((ClosureUses)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_PROP_DECL:
errno = handlePropertyDeclaration((PropertyDeclaration)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CONST_DECL:
errno = handleConstantDeclaration((ConstantDeclaration)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_CLASS_CONST_DECL:
errno = handleClassConstantDeclaration((ClassConstantDeclaration)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_NAME_LIST:
errno = handleIdentifierList((IdentifierList)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_TRAIT_ADAPTATIONS:
errno = handleTraitAdaptations((TraitAdaptations)startNode, endNode, childnum);
break;
case PHPCSVNodeTypes.TYPE_USE:
errno = handleUseStatement((UseStatement)startNode, endNode, childnum);
break;

default:
errno = defaultHandler(startNode, endNode, childnum);
}

if( 1 == errno)
throw new InvalidCSVFile("While trying to handle row "
+ row.toString() + ": Invalid childnum " + childnum
+ " for start node type " + type + ".");
else if( 2 == errno)
throw new InvalidCSVFile("While trying to handle row "
+ row.toString() + ": Cannot add child to leaf node "
+ type + ".");

return startId;
}
}

样例跟踪

以一个demo为例,调试跟踪一下Joern将nodes.csvrels.csv重新组合成AST树的过程。

case1_optimize/1.php

1
2
3
4
5
6
7
8
9
<?php

function foo(){

echo "foo in 1.php";

}

foo();

case1_optimize/2.php

1
2
3
4
5
6
7
<?php

function foo(){

echo "foo in 2.php";

}

case1_optimize_3.php

1
2
3
4
5
6
7
<?php

function foo(){

echo "foo in 3.php";

}

第一个断点打在Main.java中:

5

跟入CSVFunctionExtractor.getNextFunction

6

通过addNodeRowsUntilNextFile方法和AddEdgeRowsUntilNextFile方法按php文件读取nodes.csv文件和rels.csv文件后,将其保存在csvFifoQueue队列中,通过队列保存到csvAST对象中:

7

然后调用PHPCSV2AST.convert方法,将csvAST转为AST,调用过程如下图:

8

然后需要利用nodeInterpreterPHPCSVNodeInterpreter.handle处理不同type的节点,针对不同type的节点,有各种不同的handle****方法,能生成不同的ASTNode:

9

处理完Node之后,Node之间是没什么联系的,因为边的信息还没有加入AST中,返回CSV2AST.convert方法,继续执行createASTEdges,会将边的关系通过添加child的方式加入到ASTNode中:

10

比如demo中,以第一个function type为根节点生成的AST树为:

11

节点与节点之间的关系是通过children参数来连接的。