前言

我们知道在Joern为php源码生成CPG代码属性图的步骤是这样的:

7

本文的目的就是对phpjoern部分解析ast树的过程做一个分析以及记录。

调试环境配置

为了方便更加快速地了解PHPJoern解析的流程,我利用phpstorm来进行调试,调试前确保已经配置好xdebug,对于Ubuntu可以直接使用apt-get安装:

1
sudo apt-get install xdebug

安装好之后修改php.ini,对于我的测试机来说就是/etc/php/7.0/cli/php.ini,在末尾添加下面几行配置:

1
2
3
4
5
6
7
[xdebug]
xdebug.remote_port=9000
xdebug.remote_enable = on
xdebug.remote_host=127.0.0.1
xdebug.idekey = PHPSTORM
xdebug.remote_autostart=1
xdebug.remote_mode=req

接着在phpstorm中打开文件所在目录,下图红框中是解析php的主要文件,白色箭头指向的是执行解析的文件:

3

分析这几个文件,发现src/Parser.php是主程序,所以我们执行也可以通过执行Parser.php来替代可执行文件php2ast

src/Parser.php添加命令行配置,右上角选择Edit Configurations,然后根据你需要解析的文件相对于src/Parser.php文件的位置,填写Arguments参数:

4

源码分析

目录结构

用tree命令查看phpjoern项目的目录结构:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ tree
.
├── AUTHORS
├── conf
│   └── batch.properties
├── LICENSE
├── php2ast
├── README.md
└── src
├── CSVExporter.php
├── Exporter.php
├── GraphMLExporter.php
├── Parser.php
└── util.php

2 directories, 10 files

一共有两个文件夹,src/conf/,其中conf/batch.properties文件中保存的应该是导入neo4j时的配置信息:

1
2
3
4
5
6
7
8
9
# conf/batch.properties
cache_type = none
use_memory_mapped_buffers = true
neostore.nodestore.db.mapped_memory = 1G
neostore.relationshipstore.db.mapped_memory = 3G
neostore.propertystore.db.mapped_memory = 1G
neostore.propertystore.db.strings.mapped_memory = 500M
neostore.propertystore.db.index.keys.mapped_memory = 5M
neostore.propertystore.db.index.mapped_memory = 5M

./php2ast文件是可执行的二进制文件,它会解析php文件,将其解析为抽象语法树:

1
2
$ file php2ast 
php2ast: Bourne-Again shell script, ASCII text executable

src/目录下有5个php文件,解析处理php的逻辑就在这5个php文件中。

src/Parser.php:主程序,程序入口在该文件中

src/Exporter.php:

src/CSVExporter.php:

src/GraphMLExporter.php:

src/util.php:

src/Parser.php

先来看一下主程序src/Parser.php,其中有我自己添加的一些注释:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
<?php declare( strict_types = 1);

// report on errors, except notices
error_reporting( E_ALL & ~E_NOTICE);

/**
* This program looks for PHP files in a given directory and dumps ASTs.
*
* @author Malte Skoruppa <skoruppa@cs.uni-saarland.de>
*/

require_once 'Exporter.php';
require_once 'CSVExporter.php';
require_once 'GraphMLExporter.php';

// 将要被解析的文件或者是目录
$path = null; // file/folder to be parsed
$format = Exporter::JEXP_FORMAT; // format to use for export (default: jexp)
$nodefile = CSVExporter::NODE_FILE; // name of node file when using CSV format (default: nodes.csv)
$relfile = CSVExporter::REL_FILE; // name of relationship file when using CSV format (default: rels.csv)
$outfile = GraphMLExporter::GRAPHML_FILE; // name of output file when using GraphML format (default: graph.xml)
$scriptname = null; // this script's name
$startcount = 0; // the start count for numbering nodes

/**
* Parses the cli arguments.
* 该函数的主要作用是检查命令行参数是否合法,比如参数数目是否正确之类的
*
* @return Boolean that indicates whether the given arguments are
* fine.
*/
function parse_arguments() {

global $argv;

if( !isset( $argv)) {
if( false === (boolean) ini_get( 'register_argc_argv')) {
error_log( '[ERROR] Please enable register_argc_argv in your php.ini.');
}
else {
error_log( '[ERROR] No $argv array available.');
}
echo PHP_EOL; // PHP_EOL表示换行符\n
return false;
}

// Remove the script name (first argument)
global $scriptname;
// [My Annotation]: array_shift() 函数删除数组中第一个元素,并返回被删除元素的值
// 所以array_shift() 函数会删除第一个参数,返回的参数为我们需要解析的文件名
$scriptname = array_shift( $argv);

// 如果array_shift() 之后,$argv数组为空,则说明缺少了文件参数
if( count( $argv) === 0) {
error_log( '[ERROR] Missing argument.');
return false;
}

// Set the path and remove from command line (last argument)
global $path;
// array_pop()和array_shift()的效果相反,返回 array 的最后一个值,并会弹出数组最后一个单元(出栈),
// 即通过array_pop()可以获得需要解析的文件名
$path = (string) array_pop( $argv);

// Parse options
$longopts = ["help", "version", "format:", "nodes:", "relationships:", "out:", "count:"];
$options = getopt( "hvf:n:r:o:c:", $longopts);
if( $options === FALSE) {
error_log( '[ERROR] Could not parse command line arguments.');
return false;
}

// Help?
if( isset( $options['help']) || isset( $options['h'])) {
print_version();
echo PHP_EOL;
print_usage();
echo PHP_EOL;
print_help();
exit( 0);
}

// Version?
if( isset( $options['version']) || isset( $options['v'])) {
print_version();
exit( 0);
}

// Format?
if( isset( $options['format']) || isset( $options['f'])) {
global $format;
switch( $options['format'] ?? $options['f']) {
case "jexp":
$format = Exporter::JEXP_FORMAT;
break;
case "neo4j":
$format = Exporter::NEO4J_FORMAT;
break;
case "graphml":
$format = Exporter::GRAPHML_FORMAT;
break;
default:
error_log( "[WARNING] Unknown format '{$options['f']}', using jexp format.");
$format = Exporter::JEXP_FORMAT;
break;
}
}

// Nodes file? (for CSV output)
if( isset( $options['nodes']) || isset( $options['n'])) {
global $nodefile;
$nodefile = $options['nodes'] ?? $options['n'];
}

// Relationships file? (for CSV output)
if( isset( $options['relationships']) || isset( $options['r'])) {
global $relfile;
$relfile = $options['relationships'] ?? $options['r'];
}

// Output file? (for XML output)
if( isset( $options['out']) || isset( $options['o'])) {
global $outfile;
$outfile = $options['out'] ?? $options['o'];
}

// Start count?
if( isset( $options['count']) || isset( $options['c'])) {
global $startcount;
$startcount = (int)($options['count'] ?? $options['c']);
}

return true;
}

/**
* Prints a version message.
*/
function print_version() {

$version = 'UNKNOWN';

// Note: Only works on Unix :-p
if( file_exists( ".git/HEAD"))
if( preg_match( '/^ref: (.+)$/', file_get_contents( ".git/HEAD"), $matches))
if( file_exists( ".git/{$matches[1]}"))
$version = substr( file_get_contents( ".git/{$matches[1]}"), 0, 7);

echo "PHPJoern parser utility, commit {$version}", PHP_EOL;
}

/**
* Prints a usage message.
*/
function print_usage() {

global $scriptname;
echo 'Usage: php '.$scriptname.' [options] <file|folder>', PHP_EOL;
}

/**
* Prints a help message.
*/
function print_help() {

echo 'Options:', PHP_EOL;
echo ' -h, --help Display help message', PHP_EOL;
echo ' -v, --version Display version information', PHP_EOL;
echo ' -f, --format <format> Format to use for the output files: "jexp" (default), "neo4j", or "graphml"', PHP_EOL;
echo ' -n, --nodes <file> Output file for nodes (for CSV output, i.e., neo4j or jexp modes)', PHP_EOL;
echo ' -r, --relationships <file> Output file for relationships (for CSV output, i.e., jexp or neo4j modes)', PHP_EOL;
echo ' -o, --out <file> Output file for entire graph (for XML output, i.e., graphml mode)', PHP_EOL;
echo ' -c, --count <number> Initial value of node counter (defaults to 0)', PHP_EOL;
}

/**
* Parses and generates an AST for a single file.
* 为单个php文件解析ast
*
* @param $path Path to the file
* @param $exporter An Exporter instance to use for exporting
* the AST of the parsed file.
*
* @return The node index of the exported file node, or -1 if there
* was an error.
*/
function parse_file( $path, $exporter) : int {

$finfo = new SplFileInfo( $path);
echo "Parsing file ", $finfo->getPathname(), PHP_EOL;

try {
// ast\parse_file()是指php-ast工具,会生成一个ast树
$ast = ast\parse_file( $path, $version = 30);

// The above may throw a ParseError. We only export to the output
// file(s) if that didn't happen.
/**
* $exporter->store_filenode的返回值是fileSystem(type=file)节点的id,store_filenode()函数的作用是存储file节点的
* 因此$fnode 就是返回的fileSystem(type=File)节点对应的id
*/
$fnode = $exporter->store_filenode( $finfo->getFilename());
// 处理AST(type=AST_TOPLEVEL)节点
$tnode = $exporter->store_toplevelnode( Exporter::TOPLEVEL_FILE, $path, 1, count(file($path)));
// 处理ast中的其余节点
$astroot = $exporter->export( $ast, $tnode);
// 存储关系边
$exporter->store_rel( $tnode, $astroot, "PARENT_OF");
$exporter->store_rel( $fnode, $tnode, "FILE_OF");
//echo ast_dump( $ast), PHP_EOL;
}
catch( ParseError $e) {
$fnode = -1;
error_log( "[ERROR] In $path: ".$e->getMessage());
}

// 返回的$fnode是一个数值,为Filesystem(type=File)节点的id
return $fnode;
}

/**
* Parses and generates ASTs for all PHP files buried within a
* directory.
* parse_dir()也还是通过parse_file()函数来解析单个文件的
*
* @param $path Path to the directory
* @param $exporter An Exporter instance to use for exporting
* the ASTs of all parsed files.
* @param $exporter 决定了ast是用什么格式导出的
* @param $top Boolean indicating whether this call
* corresponds to the top-level call of the
* function. We wouldn't need this if I didn't
* insist on the root directory of a project
* getting node index 0. But, I do insist.
* @param $top top变量用来判定当前目录是否是最外层目录
*
* @return If the directory corresponding to the function call finds
* itself interesting, it stores a directory node for itself
* and this function returns the index of that
* node. Otherwise, returns -1. A directory finds itself
* interesting if it contains PHP files, or if one of its
* child directories finds itself interesting. -- As a special
* case, the root directory of a project (corresponding to the
* top-level call) always finds itself interesting and always
* stores a directory node for itself.
* @return 在没解析错误的情况下,最后的返回结果是当前遍历目录对应的目录节点id
*/
function parse_dir( $path, $exporter, $top = true) : int {

// save any interesting directory/file indices in the current folder
$found = [];
// if the current folder finds itself interesting, we will create a
// directory node for it and return its index
// 为最顶层目录也创建一个directory node,也是Filesystem,但是type=Directory,并且返回它对应id,一般情况下最顶层id=0
$dirnode = $top ? $exporter->store_dirnode( basename( $path)) : -1;

// opendir()函数打开一个目录句柄
$dhandle = opendir( $path);

// iterate over everything in the current folder
/**
* readdir()函数:返回目录中下一个文件的文件名。文件名以在文件系统中的排序返回
* @return string|false the filename on success or false on failure.
* 返回值:成功则返回文件名 或者在失败时返回 false
*/
// 循环遍历目录下的内容
while( false !== ($filename = readdir( $dhandle))) {
/**
* SplFileInfo:
* The SplFileInfo class offers a high-level object oriented interface to information for an individual file.
* SplFileInfo类是为单个文件的信息提供高级面向对象的接口,它可以用来获取文件详细信息
*/
$finfo = new SplFileInfo( build_path( $path, $filename));

/**
* SplFileInfo::isFile ( void ) : bool 判断对象是否引用了常规文件
* SplFIleInfo::isReadable ( void ) : bool 判断文件是否可读
* SplFileInfo::getExtension ( void ) : string 获取文件扩展名
* SplFileInfo::getPathname ( void ) : string //获取文件的路径
*/
if( $finfo->isFile() && $finfo->isReadable() && in_array( strtolower( $finfo->getExtension()), ['php','inc','phar']))
// 调用parse_file()函数来解析单文件,将解析了的文件存储在$found数组中
$found[] = parse_file( $finfo->getPathname(), $exporter);
else if( $finfo->isDir() && $finfo->isReadable() && $filename !== '.' && $filename !== '..')
// 处理多层目录的情况,递归调用parse_dir()
if( -1 !== ($childdir = parse_dir( $finfo->getPathname(), $exporter, false)))
$found[] = $childdir;
}

// if the current folder finds itself interesting...
if( !empty( $found)) {
if( !$top)
// 如果不是最顶层(外层)目录,另外分配id
$dirnode = $exporter->store_dirnode( basename( $path));
foreach( $found as $i => $nodeindex)
$exporter->store_rel( $dirnode, $nodeindex, "DIRECTORY_OF");
}

closedir( $dhandle);

/**
* 最后的返回结果是,当前遍历的目录,比如有目录a/b/,当前遍历深度在目录b/,那么返回的$dirnode节点id也就是目录b/对应的Filesystem(type=Directory)节点id
* 但是递归调用结束后,返回的最外层目录的id肯定为0
*/
return $dirnode;
}

/**
* Builds a file path with the appropriate directory separator.
*
* @param ...$segments Unlimited number of path segments.
*
* @return The file path built from the path segments.
*/
function build_path( ...$segments) {

return join( DIRECTORY_SEPARATOR, $segments);
}

/*
* Main script
* 主程序,程序入口处
*/
// parse_arguments()函数会先检查传入的参数是否正确
if( parse_arguments() === false) {
print_usage();
echo PHP_EOL;
print_help();
exit( 1);
}

// Check that source exists and is readable
// 然后检查文件是否存在,存在的情况下是否可读,如果不存在或是不可读,exit
if( !file_exists( $path) || !is_readable( $path)) {
error_log( '[ERROR] The given path does not exist or cannot be read.');
exit( 1);
}

$exporter = null;
// Determine whether source is a file or a directory
// 判断需要解析的文件是单文件还是目录
if( is_file( $path)) {
try {
if( $format === Exporter::GRAPHML_FORMAT)
// 确定用什么格式来导出ast,GraphML format有很多工具支持
$exporter = new GraphMLExporter( $outfile, $startcount);
else // either NEO4J_FORMAT or JEXP_FORMAT
$exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
/**
* 导出格式为NEO4J_FORMAT,neo4j-import工具可用
* 导出格式为JEXP_FORMAT,batch-import工具可用
*/
}
catch( IOError $e) {
error_log( "[ERROR] ".$e->getMessage());
exit( 1);
}
parse_file( $path, $exporter);
}
elseif( is_dir( $path)) {
try {
if( $format === Exporter::GRAPHML_FORMAT)
$exporter = new GraphMLExporter( $outfile, $startcount);
else // either NEO4J_FORMAT or JEXP_FORMAT
$exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
}
catch( IOError $e) {
error_log( "[ERROR] ".$e->getMessage());
exit( 1);
}
// 解析目录
parse_dir( $path, $exporter);
}
else {
error_log( '[ERROR] The given path is neither a regular file nor a directory.');
exit( 1);
}

echo "Done.", PHP_EOL;

拆解分析:主程序

我们先厘清程序的主要逻辑。

首先是程序入口部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/*
* Main script
* 主程序,程序入口处
*/
// parse_arguments()函数会先检查传入的参数是否正确
if( parse_arguments() === false) {
print_usage();
echo PHP_EOL;
print_help();
exit( 1);
}

// Check that source exists and is readable
// 然后检查文件是否存在,存在的情况下是否可读,如果不存在或是不可读,exit
if( !file_exists( $path) || !is_readable( $path)) {
error_log( '[ERROR] The given path does not exist or cannot be read.');
exit( 1);
}

$exporter = null;
// Determine whether source is a file or a directory
// 判断需要解析的文件是目录还是单一文件
if( is_file( $path)) {
try {
if( $format === Exporter::GRAPHML_FORMAT)
$exporter = new GraphMLExporter( $outfile, $startcount);
else // either NEO4J_FORMAT or JEXP_FORMAT
$exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
}
catch( IOError $e) {
error_log( "[ERROR] ".$e->getMessage());
exit( 1);
}
parse_file( $path, $exporter);
}
elseif( is_dir( $path)) {
try {
if( $format === Exporter::GRAPHML_FORMAT)
$exporter = new GraphMLExporter( $outfile, $startcount);
else // either NEO4J_FORMAT or JEXP_FORMAT
$exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
}
catch( IOError $e) {
error_log( "[ERROR] ".$e->getMessage());
exit( 1);
}
// 解析目录
parse_dir( $path, $exporter);
}
else {
error_log( '[ERROR] The given path is neither a regular file nor a directory.');
exit( 1);
}

echo "Done.", PHP_EOL;

用流程图来表示为:

parser

  1. 首先调用parse_arguments()检查传入的参数是否正确。

  2. 在传入参数正常的情况下,检查需要解析的目标文件是否存在及其可读性。

  3. 然后通过下面的if-else条件句来处理单文件或是目录:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    if( is_file( $path)) {
    set $exporter
    parse_file( $path, $exporter);
    else if (is_dir( $path)) {
    set $exporter
    parse_dir( $path, $exporter);
    } else {
    throw err;
    }
  4. 执行完parse_file或是parse_dir函数之后,如果没有问题,输出Done.,程序结束。

SplFileInfo文件处理类

因为后面parse_dir()parse_file()函数都会用到这个文件操作类,所以先看一下该类的主要作用。根据官方手册:https://www.php.net/manual/en/class.splfileinfo.phpSplFileInfo的主要方法及其作用为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
SplFileInfo {
/* 方法 */
public __construct ( string $file_name )
public getATime ( void ) : int //获取文件的上次访问时间
public getBasename ([ string $suffix ] ) : string //获取文件的基本名称
public getCTime ( void ) : int //获取文件 inode 修改时间
public getExtension ( void ) : string //获取文件扩展名
public getFileInfo ([ string $class_name ] ) : SplFileInfo //获取文件的SplFileInfo对象
public getFilename ( void ) : string //获取文件名
public getGroup ( void ) : int //获取文件组
public getInode ( void ) : int //获取文件的inode
public getLinkTarget ( void ) : string //获取链接的目标
public getMTime ( void ) : int //获取上次修改时间
public getOwner ( void ) : int //获取文件的所有者
public getPath ( void ) : string //获取没有文件名的路径
public getPathInfo ([ string $class_name ] ) : SplFileInfo //获取路径的SplFileInfo对象
public getPathname ( void ) : string //获取文件的路径
public getPerms ( void ) : int //获取文件权限
public getRealPath ( void ) : string //获取文件的绝对路径
public getSize ( void ) : int //获取文件大小
public getType ( void ) : string //获取文件类型
public isDir ( void ) : bool //判断文件是否是目录
public isExecutable ( void ) : bool //判断文件是否可执行
public isFile ( void ) : bool //判断对象是否引用了常规文件
public isLink ( void ) : bool //判断文件是否为链接
public isReadable ( void ) : bool //判断文件是否可读
public isWritable ( void ) : bool //判断条目是否可写
public openFile ([ string $open_mode = "r" [, bool $use_include_path = FALSE [, resource $context = NULL ]]] ) : SplFileObject //获取文件的SplFileObject对象
public setFileClass ([ string $class_name = "SplFileObject" ] ) : void //设置与SplFileInfo :: openFile一起使用的类
public setInfoClass ([ string $class_name = "SplFileInfo" ] ) : void //设置与SplFileInfo :: getFileInfo和SplFileInfo :: getPathInfo一起使用的类
public __toString ( void ) : string //以字符串形式返回文件的路径
}

拆解分析:parse_file()

parse_file()函数会为单个php文件生成ast,它接受两个参数:

$path:需要解析的文件的路径

$exporter:$exporter决定了导出ast的format

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
function parse_file( $path, $exporter) : int {

$finfo = new SplFileInfo( $path);
echo "Parsing file ", $finfo->getPathname(), PHP_EOL;

try {
// ast\parse_file()是指php-ast工具,会生成一个ast树
$ast = ast\parse_file( $path, $version = 30);

// The above may throw a ParseError. We only export to the output
// file(s) if that didn't happen.
/**
* $exporter->store_filenode的返回值是fileSystem(type=file)节点的id,store_filenode()函数的作用是存储file节点的
* 因此$fnode 就是返回的fileSystem(type=File)节点对应的id
*/
$fnode = $exporter->store_filenode( $finfo->getFilename());
// 处理AST(type=AST_TOPLEVEL)节点
$tnode = $exporter->store_toplevelnode( Exporter::TOPLEVEL_FILE, $path, 1, count(file($path)));
// 处理ast中的其余节点
$astroot = $exporter->export( $ast, $tnode);
// 存储关系边
$exporter->store_rel( $tnode, $astroot, "PARENT_OF");
$exporter->store_rel( $fnode, $tnode, "FILE_OF");
//echo ast_dump( $ast), PHP_EOL;
}
catch( ParseError $e) {
$fnode = -1;
error_log( "[ERROR] In $path: ".$e->getMessage());
}

// 返回的$fnode是一个数值,为Filesystem(type=File)节点的id
return $fnode;
}

它的主要处理步骤是这样的:

  1. 首先利用php-ast工具中的ast\parse_file()来解析php文件,生成ast。
  2. 然后分别存储文件根节点Filesystem(type=File)和toplevel节点AST(type=AST_TOPLEVEL),因为这两个节点相较其他节点比较特殊。它们在数据库中存储的信息数量和其他节点不相同。
  3. 然后递归处理其他节点。
  4. 最后调用$exporter->store_rel存储关系边。
  5. 返回$fnode,即文件节点Filesystem(type=File)对应的id。

拆解分析:parse_dir()

parse_dir()函数接受三个参数:

$path:需要解析的文件的路径

$exporter:$exporter决定了导出ast的format

$top:用来标识是不是最外层的目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
function parse_dir( $path, $exporter, $top = true) : int {

// save any interesting directory/file indices in the current folder
$found = [];
// if the current folder finds itself interesting, we will create a
// directory node for it and return its index
// 为最顶层目录也创建一个directory node,也是Filesystem,但是type=Directory,并且返回它对应id,一般情况下最顶层id=0
$dirnode = $top ? $exporter->store_dirnode( basename( $path)) : -1;

// opendir()函数打开一个目录句柄
$dhandle = opendir( $path);

// iterate over everything in the current folder
/**
* readdir()函数:返回目录中下一个文件的文件名。文件名以在文件系统中的排序返回
* @return string|false the filename on success or false on failure.
* 返回值:成功则返回文件名 或者在失败时返回 false
*/
// 循环遍历目录下的内容
while( false !== ($filename = readdir( $dhandle))) {
/**
* SplFileInfo:
* The SplFileInfo class offers a high-level object oriented interface to information for an individual file.
* SplFileInfo类是为单个文件的信息提供高级面向对象的接口,它可以用来获取文件详细信息
*/
$finfo = new SplFileInfo( build_path( $path, $filename));

/**
* SplFileInfo::isFile ( void ) : bool 判断对象是否引用了常规文件
* SplFIleInfo::isReadable ( void ) : bool 判断文件是否可读
* SplFileInfo::getExtension ( void ) : string 获取文件扩展名
* SplFileInfo::getPathname ( void ) : string //获取文件的路径
*/
if( $finfo->isFile() && $finfo->isReadable() && in_array( strtolower( $finfo->getExtension()), ['php','inc','phar']))
// 调用parse_file()函数来解析单文件,将解析了的文件存储在$found数组中
$found[] = parse_file( $finfo->getPathname(), $exporter);
else if( $finfo->isDir() && $finfo->isReadable() && $filename !== '.' && $filename !== '..')
// 处理多层目录的情况,递归调用parse_dir()
if( -1 !== ($childdir = parse_dir( $finfo->getPathname(), $exporter, false)))
$found[] = $childdir;
}

// if the current folder finds itself interesting...
if( !empty( $found)) {
if( !$top)
// 如果不是最顶层(外层)目录,另外分配id
$dirnode = $exporter->store_dirnode( basename( $path));
foreach( $found as $i => $nodeindex)
$exporter->store_rel( $dirnode, $nodeindex, "DIRECTORY_OF");
}

closedir( $dhandle);

/**
* 最后的返回结果是,当前遍历的目录,比如有目录a/b/,当前遍历深度在目录b/,那么返回的$dirnode节点id也就是目录b/对应的Filesystem(type=Directory)节点id
* 但是递归调用结束后,返回的最外层目录的id肯定为0
*/
return $dirnode;

用流程图来表示为parse_dir()的处理逻辑为(随便画的,一点儿也不规范…):

parse_dir

下面这行代码会创建一个directory节点,返回值是int类型的数字来唯一标识这个directory节点:

1
$dirnode = $top ? $exporter->store_dirnode( basename( $path)) : -1;

下图中的中心节点就是整个CPG图的最顶点节点,类型是Filesystem,它有三个子节点,这三个子节点分别对应它目录下的文件:

6

src/Exporter.php

Exporter类是CSVExporter类和GraphMLExporter类的父类。在phpstorm中查看它们的继承关系:

5

类常量和类变量

src/Exporter.php中定义了一些常量,主要是和导出格式或者节点的属性相关的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/** Constant for Neo4J format (to be used with neo4j-import) */
const NEO4J_FORMAT = 0;
/** Constant for jexp format (to be used with batch-import) */
const JEXP_FORMAT = 1;
/** Constant for GraphML format (supported by many tools) */
const GRAPHML_FORMAT = 2;

/** Labels */
const LABEL_FS = "Filesystem";
const LABEL_AST = "AST";
const LABEL_ART = "Artificial";

/** Type of directory nodes */
const DIR = "Directory";
/** Type of file nodes */
const FILE = "File";
/** Type of toplevel nodes */
const TOPLEVEL = "AST_TOPLEVEL";
/** Flags for toplevel nodes */
const TOPLEVEL_FILE = "TOPLEVEL_FILE";
const TOPLEVEL_CLASS = "TOPLEVEL_CLASS";

/** Type of entry and exit nodes (for CFG construction) */
const FUNC_ENTRY = "CFG_FUNC_ENTRY";
const FUNC_EXIT = "CFG_FUNC_EXIT";

/** Delimiter for arrays, used by format_flags() */
protected $array_delim = ";";

还有一个变量$nodecount作为id计数器:

1
2
/** Node counter */
protected $nodecount = 0;

store_filenode()

store_filenode()函数接受一个文件名$filename作为参数,它的作用是存储一个file 节点,调用的是同类的store_node()方法,该方法会返回一个id(int类型的返回值),这个id将用来唯一标识该file 节点:

1
2
3
4
public function store_filenode( $filename) : int {

return $this->store_node( self::LABEL_FS, self::FILE, null, null, null, null, null, null, null, null, $this->quote_and_escape( $filename), null);
}

store_toplevelnode()

在处理完toplevel节点之后,还会把两个比较特殊的节点,cfg图的entry节点和exit节点也存进结果中。因为这两个节点并不出现在php-ast解析的ast树中,是在store_toplevelnode中额外添加上的。然后还要调用store_rel()处理关系边:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public function store_toplevelnode( $flag, $name, $lineno, $endlineno, $childnum = null, $funcid = null, $namespace = null) : int {

// 先处理toplevel节点
$tnode = $this->store_node( self::LABEL_AST, self::TOPLEVEL, $flag, $lineno, null, $childnum, $funcid, null, $this->quote_and_escape( $namespace), $endlineno, $this->quote_and_escape( $name), null);

// For toplevel nodes, we create artificial entry and exit nodes (like file and dir nodes,
// they are not actually part of the AST).
// For the entry and exit nodes, we only set
// (1) the funcid (to the id of the toplevel node), and
// (2) the name (to that of the file or class)
/**
* 在处理了toplevel节点(toplevel节点是AST类型的)之后,还会处理两个artificial节点,分别是entry node和exit node
* 一个Artificial节点存储的信息是这样的: name: case1_optimize/1.php id: 4 type: CFG_FUNC_EXIT funcid: 2
* type分别为self::FUNC_ENTRY和self::EXIT
* name是文件名
* funcid比较特殊,是其toplevel节点的id
*/
$entrynode = $this->store_node( self::LABEL_ART, self::FUNC_ENTRY, null, null, null, null, $tnode, null, null, null, $this->quote_and_escape( $name), null);
$exitnode = $this->store_node( self::LABEL_ART, self::FUNC_EXIT, null, null, null, null, $tnode, null, null, null, $this->quote_and_escape( $name), null);
// 将toplevel节点 和entry节点 的关系边连起来,关系是ENTRY
$this->store_rel( $tnode, $entrynode, "ENTRY");
// 将toplevel节点 和exit节点 的关系边连起来,关系是EXIT
$this->store_rel( $tnode, $exitnode, "EXIT");

return $tnode;
}

store_node()

在类Exporter中,store_node()函数是一个抽象函数。

1
abstract protected function store_node( $label, $type, $flags, $lineno, $code = null, $childnum = null, $funcid = null, $classname = null, $namespace = null, $endlineno = null, $name = null, $doccomment = null, $fileid = null) : int;

src/CSVExporter.php中,store_node()具体实现为:

1
2
3
4
5
6
7
8
protected function store_node( $label, $type, $flags, $lineno, $code = null, $childnum = null, $funcid = null, $classname = null, $namespace = null, $endlineno = null, $name = null, $doccomment = null) : int {

fwrite( $this->nhandle, "{$this->nodecount}{$this->csv_delim}{$label}{$this->csv_delim}{$type}{$this->csv_delim}{$flags}{$this->csv_delim}{$lineno}{$this->csv_delim}{$code}{$this->csv_delim}{$childnum}{$this->csv_delim}{$funcid}{$this->csv_delim}{$classname}{$this->csv_delim}{$namespace}{$this->csv_delim}{$endlineno}{$this->csv_delim}{$name}{$this->csv_delim}{$doccomment}\n");

// return the current node index, *then* increment it
// $this->nodecount 记录当前的id,id是可以唯一标记一个节点的
return $this->nodecount++;
}

它实现了父类Exporter中的抽象方法store_node(),并且会将需要的信息写入csv文件中。同时,需要将id计数器nodecount增加一个单位,并将其作为函数返回值返回。

store_rel()

同样的,Exporter::store_rel()也是一个抽象函数,在GraphMLExporter类和CSVExporter类中有不同的定义。

1
abstract public function store_rel( $start, $end, $type);

CSVExporter::store_rel()函数实现为:

1
2
3
4
public function store_rel( $start, $end, $type) {

fwrite( $this->rhandle, "{$start}{$this->csv_delim}{$end}{$this->csv_delim}{$type}\n");
}

会存储节点之间的关系,参数$type就是relationship,比如PARENT_OF, ENTRY, EXIT

export()

phpjoern处理ast树的file节点和toplevel节点是分别在store_filenode()store_toplevel()函数中完成的,ast树的其余部分主体则是在export()中处理的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
public function export( $ast, $funcid, $nodeline = 0, $childname = "", $childnum = 0, $namespace = "", $uses = [], $classname = "") : int {

// (1) if $ast is an AST node, print info and recurse
// An instance of ast\Node declares:
// $kind (integer, name can be retrieved using ast\get_kind_name())
// $flags (integer, corresponding to a set of flags for the current node)
// $lineno (integer, starting line number)
// $children (array of child nodes)
// Additionally, an instance of the subclass ast\Node\Decl declares:
// $endLineno (integer, end line number of the declaration)
// $name (string, the name of the declared function/class)
// $docComment (string, the preceding doc comment)
// 首先处理最普遍的情况:当前节点是AST类型的
if( $ast instanceof ast\Node) {

/**
* function get_kind_name(int $kind): string {}
* @param int $kind AST_* constant value defining the kind of an AST node
* @return string String representation of AST kind value
* 该函数会根据$kind值返回一个string, 该string对应特定的type,即$kind是和type对应的
*/
$nodetype = ast\get_kind_name( $ast->kind);
$nodeline = $ast->lineno;

$nodeflags = "";
/**
* function kind_uses_flags(int $kind): bool {}
* @param int $kind AST_* constant value defining the kind of an AST node
* @return bool Returns true if AST kind uses flags
* 判定某个AST_*类型是否有flags标志位,比如AST_NAME就有flags标志位,如NAME_NOT_FQ, NAME_FQ
*/
if( ast\kind_uses_flags( $ast->kind)) {
$nodeflags = $this->format_flags( $ast->kind, $ast->flags);
}

// for decl nodes:
if( isset( $ast->endLineno)) {
$nodeendline = $ast->endLineno;
}
if( isset( $ast->name)) {
$nodename = $ast->name;
}
if( isset( $ast->docComment)) {
$nodedoccomment = $this->quote_and_escape( $ast->docComment);
}

// store node, export all children and store the relationships
$rootnode = $this->store_node( self::LABEL_AST, $nodetype, $nodeflags, $nodeline, null, $childnum, $funcid, $classname, $this->quote_and_escape( $namespace), $nodeendline, $nodename, $nodedoccomment);

// If this node is a function/method/closure declaration, set $funcid.
// Note that in particular, the decl node *itself* does not have $funcid set to its own id;
// this is intentional. The *declaration* of a function/method/closure itself is part of the
// control flow of the outer scope: e.g., a closure declaration is part of the control flow
// of the function it is declared in, or a function/method declaration is part of the control flow
// of the pseudo-function representing the top-level code it is declared in.
// Note: we do not need to do this for TOPLEVEL types (and it wouldn't be straightforward since we
// do not generate ast\Node objects for them). Rather, for toplevel nodes under files, the funcid is
// set by the Parser class, which also stores the File node; and for toplevel nodes under classes,
// we do it below, while iterating over the children.
// Also, we create artificial entry and exit nodes for the CFG of the function (like file and dir nodes,
// they are not actually part of the AST).
// For the entry and exit nodes, we only set
// (1) the funcid (to the id of the function node), and
// (2) the name (to that of the function)
if( $ast->kind === ast\AST_FUNC_DECL || $ast->kind === ast\AST_METHOD || $ast->kind === ast\AST_CLOSURE) {
$funcid = $rootnode;
$entrynode = $this->store_node( self::LABEL_ART, self::FUNC_ENTRY, null, null, null, null, $rootnode, $classname, $this->quote_and_escape( $namespace), null, $this->quote_and_escape( $nodename), null);
$exitnode = $this->store_node( self::LABEL_ART, self::FUNC_EXIT, null, null, null, null, $rootnode, $classname, $this->quote_and_escape( $namespace), null, $this->quote_and_escape( $nodename), null);
$this->store_rel( $rootnode, $entrynode, "ENTRY");
$this->store_rel( $rootnode, $exitnode, "EXIT");
}

// If this node is a class declaration, set $classname
// 如果当前节点是class声明,那么记录$classname
if( $ast->kind === ast\AST_CLASS) {
$classname = $nodename;
}

// iterate over the children and count them
// 开始递归遍历子节点
$i = 0;
foreach( $ast->children as $childrel => $child) {

// If we encounter a child node that is a namespace node, set the namespace for subtrees and upcoming sister nodes
// Note that we do not care whether the non-bracketed syntax (second child of AST_NAMESPACE is null)
// or the bracketed syntax (second child of AST_NAMESPACE is a statement) was used:
// (1) if non-bracketed, the namespace must be set for all upcoming sister nodes until we encounter
// the next AST_NAMESPACE
// (2) if bracketed, the namespace in principle only holds for the subtree rooted in the second child
// of AST_NAMESPACE (and should be set only for that subtree, but not for upcoming sister nodes);
// however, in this case the next sister node (if it exists) *must* be another
// AST_NAMESPACE node, according to PHP syntax (otherwise, a 'No code may exist outside of namespace {}'
// fatal error would be thrown at runtime.) Hence, if the next sister node is an AST_NAMESPACE anyway,
// the namespace will be set to something new once we finished off the subtree rooted in the
// second child of the AST_NAMESPACE we encountered.
if( $child->kind === ast\AST_NAMESPACE) {
$namespace = $child->children["name"] ?? "";
$uses = []; // any namespace statement cancels all uses currently in effect
}

// If we encounter a child node that is a use node, add the translation rules specified by it
// to the translation rules currently in effect
if( $child->kind === ast\AST_USE) {
$uses = array_merge( $uses, $this->getTranslationRulesForUse( $child));
}

// for the "stmts" child of an AST_CLASS, which is an AST_STMT_LIST,
// we insert an artificial toplevel function node
if( $ast->kind === ast\AST_CLASS && $childrel === "stmts") {
$tnode = $this->store_toplevelnode( Exporter::TOPLEVEL_CLASS, $nodename, $nodeline, $nodeendline, $i, $funcid, $namespace);
// when exporting the AST_STMT_LIST below the AST_CLASS, the
// funcid is set to the toplevel node's id, childname is set to "stmts" (doesn't really matter, we can invent a name here), and childnum is set to 0
// 递归遍历子节点
$childnode = $this->export( $child, $tnode, $nodeline, "stmts", 0, $namespace, $uses, $classname);
$this->store_rel( $tnode, $childnode, "PARENT_OF"); // AST_TOPLEVEL -> AST_STMT_LIST
$this->store_rel( $rootnode, $tnode, "PARENT_OF"); // AST_CLASS -> AST_TOPLEVEL
}
// for the child of an AST_NAME node which is *not* fully qualified, we apply the translation rules currently in effect
elseif( $ast->kind === ast\AST_NAME && $childrel === "name" && $ast->flags !== ast\flags\NAME_FQ) {
$child = $this->applyTranslationRulesForName( $child, $uses);
// 递归遍历子节点
$childnode = $this->export( $child, $funcid, $nodeline, $childrel, $i, $namespace, $uses, $classname);
$this->store_rel( $rootnode, $childnode, "PARENT_OF");
}
// in all other cases, we simply recurse straightforwardly
else {
// 递归遍历子节点
$childnode = $this->export( $child, $funcid, $nodeline, $childrel, $i, $namespace, $uses, $classname);
$this->store_rel( $rootnode, $childnode, "PARENT_OF");
}

// next child...
$i++;
}
}

export()函数最外层有4个分支来处理不同的ast节点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
if( $ast instanceof ast\Node) {
// 第一种情况是最常见的,节点为ast\Node类型
// 在这一层,是唯一可能有child ast\Node节点的,所以在这一层还会对子节点进行递归遍历
}
// 接下去都是处理非AST节点,如果不是一个AST节点,还有其他可能,比如一种是string类型的字符串,另一种是NULL类型的节点,还有一种可能是integer, double等之类的数值
else if( is_string( $ast)) {
// 处理ast为字符串类型的节点
} else if( $ast === null) {
// 处理type=NULL的ast节点
} else {
// 如果当前value既不是字符串string,也不是NULL类型,那么就将其转换为string类型
// 开发者一开始认为该分支对应的value可能是布尔值,integers,floats或doubles,arrays,objects或是resources
// 但是经过测试发现,该分支对应的value仅有integer和floats或doubles
}

export()中,遇到AST_USE节点的时候还用到了getTranslationRulesForUse()来额外处理;遇到AST_NAME节点的时候使用applyTranslationRulesForName()来特殊处理。

getTranslationRulesForUse()

要了解getTranslationRulesForUse()函数和applyTranslationRulesForName()函数的作用,我们用一个testcase来说明:

1
2
3
4
5
6
7
8
9
10
// use.php
<?php

namespace com\rsumilang\util;
use com\rsumlang\common as Common;

class String extends Common\Object
{
// ... code ...
}

然后再用php-ast解析use.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
<?php
require '../util.php';
$code = <<<'EOC'
<?php
namespace com\rsumilang\util;
use com\rsumlang\common as Common;

class String extends Common\Object
{
// ... code ...
}
EOC;

var_dump(ast\parse_code($code, $version=30));

// OUTPUT:
class ast\Node#1 (4) {
public $kind =>
int(133)
public $flags =>
int(0)
public $lineno =>
int(1)
public $children =>
array(3) {
[0] =>
class ast\Node#2 (4) {
public $kind =>
int(541)
public $flags =>
int(0)
public $lineno =>
int(2)
public $children =>
array(2) {
'name' =>
string(18) "com\rsumilang\util"
'stmts' =>
NULL
}
}
[1] =>
class ast\Node#3 (4) {
public $kind =>
int(144)
public $flags =>
int(361)
public $lineno =>
int(3)
public $children =>
array(1) {
[0] =>
class ast\Node#4 (4) {
public $kind =>
int(542)
public $flags =>
int(0)
public $lineno =>
int(3)
public $children =>
array(2) {
'name' =>
string(19) "com\rsumlang\common"
'alias' =>
string(6) "Common"
}
}
}
}
[2] =>
class ast\Node\Decl#5 (7) {
public $kind =>
int(69)
public $flags =>
int(0)
public $lineno =>
int(5)
public $children =>
array(3) {
'extends' =>
class ast\Node#6 (4) {
public $kind =>
int(2048)
public $flags =>
int(1)
public $lineno =>
int(5)
public $children =>
array(1) {
'name' =>
string(13) "Common\Object"
}
}
'implements' =>
NULL
'stmts' =>
class ast\Node#7 (4) {
public $kind =>
int(133)
public $flags =>
int(0)
public $lineno =>
int(6)
public $children =>
array(0) {
}
}
}
public $endLineno =>
int(8)
public $name =>
string(6) "String"
public $docComment =>
NULL
}
}
}

testcase中使用use关键字来导入namespace:com\rsumlang\common,并且用别名Common来表示该命名空间。命名空间Common\Object实际上是com\rsumlang\common\Object

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
private function getTranslationRulesForUse( $astuse) : array {

if( !($astuse instanceof ast\Node) || ($astuse->kind !== ast\AST_USE))
throw new Exception("Illegal argument to getTranslationRulesForUse(): " . var_export($astuse, true));

$uses = [];

foreach( $astuse->children as $astuseelem) {
$actual = $astuseelem->children["name"];
// if no alias is given, the default one is the last part of the actual namespace
/**
* strrpos ( string $haystack , string $needle , int $offset = 0 ) : int
* 计算指定字符串在目标字符串中最后一次出现的位置
*/
// 如果use关键字没有使用 别名alias,那么就令actual namespace最后一个`\`后面的部分作为别名
// $uses[] 在后面的applyTranslationRulesForName()函数会用到
$alias = $astuseelem->children["alias"] ?? substr( $actual, strrpos( $actual, "\\") + 1);
$uses[$alias] = $actual;
}

return $uses;
}

通过getTranslationRulesForUse()函数,就能够将命名空间的别名保存在$uses[]数组中:

8

applyTranslationRulesForName()

然后再遍历到其他节点的时候,通过applyTranslationRulesForName()函数还原出原始的namespace。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
private function applyTranslationRulesForName( $haystack, $uses) : string {

if( !is_string( $haystack))
throw new Exception("Illegal argument to applyTranslationRulesForName(): " . var_export($haystack, true));

foreach( $uses as $needle => $replacement) {
$needle .= "\\";
$replacement .= "\\";
// crude imitation of startsWith( $haystack, $needle)
if( substr( $haystack, 0, strlen( $needle)) === $needle)
return $replacement . substr( $haystack, strlen( $needle));
}

return $haystack;
}

7

这样就能拿到正确完整的命名空间。

php-ast中相关函数

ast\get_kind_name()

https://github.com/nikic/php-ast/blob/b8fa288b4922fe923236a198e0fb17e3441a888b/ast.stub.php#L31

1
2
3
4
5
/**
* @param int $kind AST_* constant value defining the kind of an AST node
* @return string String representation of AST kind value
*/
function get_kind_name(int $kind): string {}

该函数会根据$kind值返回一个string, 该string对应特定的type,即$kind是和type (AST_*) 对应的。

来自https://github.com/nikic/php-ast/blob/b09570df0098baba601a89e65348d0b04c351d69/ast_stub.php#L8 定义了不同的AST_* 与$kind值进行对照:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
// AST KIND CONSTANTS
namespace ast;
const AST_ARG_LIST = 128;
const AST_LIST = 255;
const AST_ARRAY = 129;
const AST_ENCAPS_LIST = 130;
const AST_EXPR_LIST = 131;
const AST_STMT_LIST = 132;
const AST_IF = 133;
const AST_SWITCH_LIST = 134;
const AST_CATCH_LIST = 135;
const AST_PARAM_LIST = 136;
const AST_CLOSURE_USES = 137;
const AST_PROP_DECL = 138;
const AST_CONST_DECL = 139;
const AST_CLASS_CONST_DECL = 140;
const AST_NAME_LIST = 141;
const AST_TRAIT_ADAPTATIONS = 142;
const AST_USE = 143;
const AST_TYPE_UNION = 144;
const AST_ATTRIBUTE_LIST = 145;
const AST_ATTRIBUTE_GROUP = 146;
const AST_MATCH_ARM_LIST = 147;
const AST_NAME = 2048;
const AST_CLOSURE_VAR = 2049;
const AST_NULLABLE_TYPE = 2050;
const AST_FUNC_DECL = 67;
const AST_CLOSURE = 68;
const AST_METHOD = 69;
const AST_ARROW_FUNC = 71;
const AST_CLASS = 70;
const AST_MAGIC_CONST = 0;
const AST_TYPE = 1;
const AST_VAR = 256;
const AST_CONST = 257;
const AST_UNPACK = 258;
const AST_CAST = 261;
const AST_EMPTY = 262;
const AST_ISSET = 263;
const AST_SHELL_EXEC = 265;
const AST_CLONE = 266;
const AST_EXIT = 267;
const AST_PRINT = 268;
const AST_INCLUDE_OR_EVAL = 269;
const AST_UNARY_OP = 270;
const AST_PRE_INC = 271;
const AST_PRE_DEC = 272;
const AST_POST_INC = 273;
const AST_POST_DEC = 274;
const AST_YIELD_FROM = 275;
const AST_GLOBAL = 277;
const AST_UNSET = 278;
const AST_RETURN = 279;
const AST_LABEL = 280;
const AST_REF = 281;
const AST_HALT_COMPILER = 282;
const AST_ECHO = 283;
const AST_THROW = 284;
const AST_GOTO = 285;
const AST_BREAK = 286;
const AST_CONTINUE = 287;
const AST_CLASS_NAME = 276;
const AST_CLASS_CONST_GROUP = 546;
const AST_DIM = 512;
const AST_PROP = 513;
const AST_NULLSAFE_PROP = 514;
const AST_STATIC_PROP = 515;
const AST_CALL = 516;
const AST_CLASS_CONST = 517;
const AST_ASSIGN = 518;
const AST_ASSIGN_REF = 519;
const AST_ASSIGN_OP = 520;
const AST_BINARY_OP = 521;
const AST_ARRAY_ELEM = 526;
const AST_NEW = 527;
const AST_INSTANCEOF = 528;
const AST_YIELD = 529;
const AST_STATIC = 532;
const AST_WHILE = 533;
const AST_DO_WHILE = 534;
const AST_IF_ELEM = 535;
const AST_SWITCH = 536;
const AST_SWITCH_CASE = 537;
const AST_DECLARE = 538;
const AST_PROP_ELEM = 775;
const AST_PROP_GROUP = 774;
const AST_CONST_ELEM = 776;
const AST_USE_TRAIT = 539;
const AST_TRAIT_PRECEDENCE = 540;
const AST_METHOD_REFERENCE = 541;
const AST_NAMESPACE = 542;
const AST_USE_ELEM = 543;
const AST_TRAIT_ALIAS = 544;
const AST_GROUP_USE = 545;
const AST_ATTRIBUTE = 547;
const AST_MATCH = 548;
const AST_MATCH_ARM = 549;
const AST_NAMED_ARG = 550;
const AST_ENUM_CASE = 777;
const AST_METHOD_CALL = 768;
const AST_NULLSAFE_METHOD_CALL = 769;
const AST_STATIC_CALL = 770;
const AST_CONDITIONAL = 771;
const AST_TRY = 772;
const AST_CATCH = 773;
const AST_FOR = 1024;
const AST_FOREACH = 1025;
const AST_PARAM = 1280;
// END AST KIND CONSTANTS
ast\kind_uses_flags()

https://github.com/nikic/php-ast/blob/b8fa288b4922fe923236a198e0fb17e3441a888b/ast.stub.php#L37

1
2
3
4
5
/**
* @param int $kind AST_* constant value defining the kind of an AST node
* @return bool Returns true if AST kind uses flags
*/
function kind_uses_flags(int $kind): bool {}

判定某个AST_*类型是否有flags标志位,比如AST_NAME就有flags标志位,如NAME_NOT_FQ, NAME_FQ

src/util.php

src/util.php文件是从php-ast项目中复制回来的文件,主要是用来处理flags标志位的。

1
2
3
function format_flags(int $kind, int $flags) : string {}

function get_flag_info() : array {}

后记

phpjoern部分的源码比较简单,只要搭个调试环境,跟踪一下就可以理清程序的主要逻辑。我跟踪调试的文件比较少,可能覆盖的范围还不太足够,还可以去 https://github.com/nikic/php-ast/tree/master/tests 找测试用例。

所以接下去,我需要对phpjoern进行一个小的修改,仿照funcid,为每个节点记录其对应的文件id,记为fileid,这我将记录在下一篇文章中。