PHPJoern源码阅读

前言

我们知道在Joern为php源码生成CPG代码属性图的步骤是这样的：

本文的目的就是对phpjoern部分解析ast树的过程做一个分析以及记录。

调试环境配置

为了方便更加快速地了解PHPJoern解析的流程，我利用phpstorm来进行调试，调试前确保已经配置好xdebug，对于Ubuntu可以直接使用apt-get安装：

1	sudo apt-get install xdebug

安装好之后修改php.ini，对于我的测试机来说就是/etc/php/7.0/cli/php.ini，在末尾添加下面几行配置：

[xdebug]
xdebug.remote_port=9000
xdebug.remote_enable = on
xdebug.remote_host=127.0.0.1
xdebug.idekey = PHPSTORM
xdebug.remote_autostart=1
xdebug.remote_mode=req

接着在phpstorm中打开文件所在目录，下图红框中是解析php的主要文件，白色箭头指向的是执行解析的文件：

分析这几个文件，发现src/Parser.php是主程序，所以我们执行也可以通过执行Parser.php来替代可执行文件php2ast。

为src/Parser.php添加命令行配置，右上角选择Edit Configurations，然后根据你需要解析的文件相对于src/Parser.php文件的位置，填写Arguments参数：

源码分析

目录结构

用tree命令查看phpjoern项目的目录结构：

$ tree
.
├── AUTHORS
├── conf
│   └── batch.properties
├── LICENSE
├── php2ast
├── README.md
└── src
    ├── CSVExporter.php
    ├── Exporter.php
    ├── GraphMLExporter.php
    ├── Parser.php
    └── util.php

2 directories, 10 files

一共有两个文件夹，src/和conf/，其中conf/batch.properties文件中保存的应该是导入neo4j时的配置信息：

# conf/batch.properties
cache_type = none
use_memory_mapped_buffers = true
neostore.nodestore.db.mapped_memory = 1G
neostore.relationshipstore.db.mapped_memory = 3G
neostore.propertystore.db.mapped_memory = 1G
neostore.propertystore.db.strings.mapped_memory = 500M
neostore.propertystore.db.index.keys.mapped_memory = 5M
neostore.propertystore.db.index.mapped_memory = 5M

./php2ast文件是可执行的二进制文件，它会解析php文件，将其解析为抽象语法树：

1 2	$ file php2ast php2ast: Bourne-Again shell script, ASCII text executable

src/目录下有5个php文件，解析处理php的逻辑就在这5个php文件中。

src/Parser.php：主程序，程序入口在该文件中

src/Exporter.php：

src/CSVExporter.php：

src/GraphMLExporter.php：

src/util.php：

src/Parser.php

先来看一下主程序src/Parser.php，其中有我自己添加的一些注释：

<?php declare( strict_types = 1);

// report on errors, except notices
error_reporting( E_ALL & ~E_NOTICE);

/**
 * This program looks for PHP files in a given directory and dumps ASTs.
 *
 * @author Malte Skoruppa <skoruppa@cs.uni-saarland.de>
 */

require_once 'Exporter.php';
require_once 'CSVExporter.php';
require_once 'GraphMLExporter.php';

// 将要被解析的文件或者是目录
$path = null; // file/folder to be parsed
$format = Exporter::JEXP_FORMAT; // format to use for export (default: jexp)
$nodefile = CSVExporter::NODE_FILE; // name of node file when using CSV format (default: nodes.csv)
$relfile = CSVExporter::REL_FILE; // name of relationship file when using CSV format (default: rels.csv)
$outfile = GraphMLExporter::GRAPHML_FILE; // name of output file when using GraphML format (default: graph.xml)
$scriptname = null; // this script's name
$startcount = 0; // the start count for numbering nodes

/**
 * Parses the cli arguments.
 * 该函数的主要作用是检查命令行参数是否合法，比如参数数目是否正确之类的
 *
 * @return Boolean that indicates whether the given arguments are
 *         fine.
 */
function parse_arguments() {

  global $argv;
  
  if( !isset( $argv)) {
    if( false === (boolean) ini_get( 'register_argc_argv')) {
      error_log( '[ERROR] Please enable register_argc_argv in your php.ini.');
    }
    else {
      error_log( '[ERROR] No $argv array available.');
    }
    echo PHP_EOL;   // PHP_EOL表示换行符\n
    return false;
  }

  // Remove the script name (first argument)
  global $scriptname;
  // [My Annotation]: array_shift() 函数删除数组中第一个元素，并返回被删除元素的值
    // 所以array_shift() 函数会删除第一个参数，返回的参数为我们需要解析的文件名
  $scriptname = array_shift( $argv);

  // 如果array_shift() 之后，$argv数组为空，则说明缺少了文件参数
  if( count( $argv) === 0) {
    error_log( '[ERROR] Missing argument.');
    return false;
  }

  // Set the path and remove from command line (last argument)
  global $path;
  // array_pop()和array_shift()的效果相反，返回 array 的最后一个值，并会弹出数组最后一个单元（出栈），
    // 即通过array_pop()可以获得需要解析的文件名
  $path = (string) array_pop( $argv);

  // Parse options
  $longopts  = ["help", "version", "format:", "nodes:", "relationships:", "out:", "count:"];
  $options = getopt( "hvf:n:r:o:c:", $longopts);
  if( $options === FALSE) {
    error_log( '[ERROR] Could not parse command line arguments.');
    return false;
  }

  // Help?
  if( isset( $options['help']) || isset( $options['h'])) {
    print_version();
    echo PHP_EOL;
    print_usage();
    echo PHP_EOL;
    print_help();
    exit( 0);
  }

  // Version?
  if( isset( $options['version']) || isset( $options['v'])) {
    print_version();
    exit( 0);
  }

  // Format?
  if( isset( $options['format']) || isset( $options['f'])) {
    global $format;
    switch( $options['format'] ?? $options['f']) {
    case "jexp":
      $format = Exporter::JEXP_FORMAT;
      break;
    case "neo4j":
      $format = Exporter::NEO4J_FORMAT;
      break;
    case "graphml":
      $format = Exporter::GRAPHML_FORMAT;
      break;
    default:
      error_log( "[WARNING] Unknown format '{$options['f']}', using jexp format.");
      $format = Exporter::JEXP_FORMAT;
      break;
    }
  }

  // Nodes file? (for CSV output)
  if( isset( $options['nodes']) || isset( $options['n'])) {
    global $nodefile;
    $nodefile = $options['nodes'] ?? $options['n'];
  }

  // Relationships file? (for CSV output)
  if( isset( $options['relationships']) || isset( $options['r'])) {
    global $relfile;
    $relfile = $options['relationships'] ?? $options['r'];
  }

  // Output file? (for XML output)
  if( isset( $options['out']) || isset( $options['o'])) {
    global $outfile;
    $outfile = $options['out'] ?? $options['o'];
  }

  // Start count?
  if( isset( $options['count']) || isset( $options['c'])) {
    global $startcount;
    $startcount = (int)($options['count'] ?? $options['c']);
  }

  return true;
}

/**
 * Prints a version message.
 */
function print_version() {

  $version = 'UNKNOWN';

  // Note: Only works on Unix :-p
  if( file_exists( ".git/HEAD"))
    if( preg_match( '/^ref: (.+)$/', file_get_contents( ".git/HEAD"), $matches))
      if( file_exists( ".git/{$matches[1]}"))
        $version = substr( file_get_contents( ".git/{$matches[1]}"), 0, 7);

  echo "PHPJoern parser utility, commit {$version}", PHP_EOL;
}

/**
 * Prints a usage message.
 */
function print_usage() {

  global $scriptname;
  echo 'Usage: php '.$scriptname.' [options] <file|folder>', PHP_EOL;
}

/**
 * Prints a help message.
 */
function print_help() {

  echo 'Options:', PHP_EOL;
  echo '  -h, --help                 Display help message', PHP_EOL;
  echo '  -v, --version              Display version information', PHP_EOL;
  echo '  -f, --format <format>      Format to use for the output files: "jexp" (default), "neo4j", or "graphml"', PHP_EOL;
  echo '  -n, --nodes <file>         Output file for nodes (for CSV output, i.e., neo4j or jexp modes)', PHP_EOL;
  echo '  -r, --relationships <file> Output file for relationships (for CSV output, i.e., jexp or neo4j modes)', PHP_EOL;
  echo '  -o, --out <file>           Output file for entire graph (for XML output, i.e., graphml mode)', PHP_EOL;
  echo '  -c, --count <number>       Initial value of node counter (defaults to 0)', PHP_EOL;
}

/**
 * Parses and generates an AST for a single file.
 * 为单个php文件解析ast
 *
 * @param $path     Path to the file
 * @param $exporter An Exporter instance to use for exporting
 *                  the AST of the parsed file.
 *
 * @return The node index of the exported file node, or -1 if there
 *         was an error.
 */
function parse_file( $path, $exporter) : int {

  $finfo = new SplFileInfo( $path);
  echo "Parsing file ", $finfo->getPathname(), PHP_EOL;

  try {
      // ast\parse_file()是指php-ast工具，会生成一个ast树
    $ast = ast\parse_file( $path, $version = 30);

    // The above may throw a ParseError. We only export to the output
    // file(s) if that didn't happen.
      /**
       * $exporter->store_filenode的返回值是fileSystem(type=file)节点的id，store_filenode()函数的作用是存储file节点的
       * 因此$fnode 就是返回的fileSystem(type=File)节点对应的id
       */
    $fnode = $exporter->store_filenode( $finfo->getFilename());
    // 处理AST(type=AST_TOPLEVEL)节点
    $tnode = $exporter->store_toplevelnode( Exporter::TOPLEVEL_FILE, $path, 1, count(file($path)));
    // 处理ast中的其余节点
    $astroot = $exporter->export( $ast, $tnode);
    // 存储关系边
    $exporter->store_rel( $tnode, $astroot, "PARENT_OF");
    $exporter->store_rel( $fnode, $tnode, "FILE_OF");
    //echo ast_dump( $ast), PHP_EOL;
  }
  catch( ParseError $e) {
    $fnode = -1;
    error_log( "[ERROR] In $path: ".$e->getMessage());
  }

  // 返回的$fnode是一个数值，为Filesystem(type=File)节点的id
  return $fnode;
}

/**
 * Parses and generates ASTs for all PHP files buried within a
 * directory.
 * parse_dir()也还是通过parse_file()函数来解析单个文件的
 *
 * @param $path     Path to the directory
 * @param $exporter An Exporter instance to use for exporting
 *                  the ASTs of all parsed files.
 * @param $exporter 决定了ast是用什么格式导出的
 * @param $top      Boolean indicating whether this call
 *                  corresponds to the top-level call of the
 *                  function. We wouldn't need this if I didn't
 *                  insist on the root directory of a project
 *                  getting node index 0. But, I do insist.
 * @param $top      top变量用来判定当前目录是否是最外层目录
 *
 * @return If the directory corresponding to the function call finds
 *         itself interesting, it stores a directory node for itself
 *         and this function returns the index of that
 *         node. Otherwise, returns -1. A directory finds itself
 *         interesting if it contains PHP files, or if one of its
 *         child directories finds itself interesting. -- As a special
 *         case, the root directory of a project (corresponding to the
 *         top-level call) always finds itself interesting and always
 *         stores a directory node for itself.
 * @return 在没解析错误的情况下，最后的返回结果是当前遍历目录对应的目录节点id
 */
function parse_dir( $path, $exporter, $top = true) : int {

  // save any interesting directory/file indices in the current folder
  $found = [];
  // if the current folder finds itself interesting, we will create a
  // directory node for it and return its index
    // 为最顶层目录也创建一个directory node，也是Filesystem，但是type=Directory，并且返回它对应id，一般情况下最顶层id=0
  $dirnode = $top ? $exporter->store_dirnode( basename( $path)) : -1;

  // opendir()函数打开一个目录句柄
  $dhandle = opendir( $path);

  // iterate over everything in the current folder
    /**
     * readdir()函数：返回目录中下一个文件的文件名。文件名以在文件系统中的排序返回
     * @return string|false the filename on success or false on failure.
     * 返回值：成功则返回文件名 或者在失败时返回 false
     */
    // 循环遍历目录下的内容
  while( false !== ($filename = readdir( $dhandle))) {
      /**
       * SplFileInfo:
       * The SplFileInfo class offers a high-level object oriented interface to information for an individual file.
       * SplFileInfo类是为单个文件的信息提供高级面向对象的接口，它可以用来获取文件详细信息
       */
    $finfo = new SplFileInfo( build_path( $path, $filename));

      /**
       * SplFileInfo::isFile ( void ) : bool    判断对象是否引用了常规文件
       * SplFIleInfo::isReadable ( void ) : bool    判断文件是否可读
       * SplFileInfo::getExtension ( void ) : string    获取文件扩展名
       * SplFileInfo::getPathname ( void ) : string    //获取文件的路径
       */
    if( $finfo->isFile() && $finfo->isReadable() && in_array( strtolower( $finfo->getExtension()), ['php','inc','phar']))
      // 调用parse_file()函数来解析单文件，将解析了的文件存储在$found数组中
        $found[] = parse_file( $finfo->getPathname(), $exporter);
    else if( $finfo->isDir() && $finfo->isReadable() && $filename !== '.' && $filename !== '..')
        // 处理多层目录的情况，递归调用parse_dir()
      if( -1 !== ($childdir = parse_dir( $finfo->getPathname(), $exporter, false)))
        $found[] = $childdir;
  }

  // if the current folder finds itself interesting...
  if( !empty( $found)) {
    if( !$top)
        // 如果不是最顶层（外层）目录，另外分配id
      $dirnode = $exporter->store_dirnode( basename( $path));
    foreach( $found as $i => $nodeindex)
      $exporter->store_rel( $dirnode, $nodeindex, "DIRECTORY_OF");
  }

  closedir( $dhandle);

    /**
     * 最后的返回结果是，当前遍历的目录，比如有目录a/b/，当前遍历深度在目录b/，那么返回的$dirnode节点id也就是目录b/对应的Filesystem(type=Directory)节点id
     * 但是递归调用结束后，返回的最外层目录的id肯定为0
     */
  return $dirnode;
}

/**
 * Builds a file path with the appropriate directory separator.
 *
 * @param ...$segments Unlimited number of path segments.
 *
 * @return The file path built from the path segments.
 */
function build_path( ...$segments) {

  return join( DIRECTORY_SEPARATOR, $segments);
}

/*
 * Main script
 * 主程序，程序入口处
 */
// parse_arguments()函数会先检查传入的参数是否正确
if( parse_arguments() === false) {
  print_usage();
  echo PHP_EOL;
  print_help();
  exit( 1);
}

// Check that source exists and is readable
// 然后检查文件是否存在，存在的情况下是否可读，如果不存在或是不可读，exit
if( !file_exists( $path) || !is_readable( $path)) {
  error_log( '[ERROR] The given path does not exist or cannot be read.');
  exit( 1);
}

$exporter = null;
// Determine whether source is a file or a directory
// 判断需要解析的文件是单文件还是目录
if( is_file( $path)) {
  try {
    if( $format === Exporter::GRAPHML_FORMAT)
        // 确定用什么格式来导出ast，GraphML format有很多工具支持
      $exporter = new GraphMLExporter( $outfile, $startcount);
    else // either NEO4J_FORMAT or JEXP_FORMAT
      $exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
    /**
     * 导出格式为NEO4J_FORMAT，neo4j-import工具可用
     * 导出格式为JEXP_FORMAT，batch-import工具可用
     */
  }
  catch( IOError $e) {
    error_log( "[ERROR] ".$e->getMessage());
    exit( 1);
  }
  parse_file( $path, $exporter);
}
elseif( is_dir( $path)) {
  try {
    if( $format === Exporter::GRAPHML_FORMAT)
      $exporter = new GraphMLExporter( $outfile, $startcount);
    else // either NEO4J_FORMAT or JEXP_FORMAT
      $exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
  }
  catch( IOError $e) {
    error_log( "[ERROR] ".$e->getMessage());
    exit( 1);
  }
  // 解析目录
  parse_dir( $path, $exporter);
}
else {
  error_log( '[ERROR] The given path is neither a regular file nor a directory.');
  exit( 1);
}

echo "Done.", PHP_EOL;

拆解分析：主程序

我们先厘清程序的主要逻辑。

首先是程序入口部分：

/*
 * Main script
 * 主程序，程序入口处
 */
// parse_arguments()函数会先检查传入的参数是否正确
if( parse_arguments() === false) {
  print_usage();
  echo PHP_EOL;
  print_help();
  exit( 1);
}

// Check that source exists and is readable
// 然后检查文件是否存在，存在的情况下是否可读，如果不存在或是不可读，exit
if( !file_exists( $path) || !is_readable( $path)) {
  error_log( '[ERROR] The given path does not exist or cannot be read.');
  exit( 1);
}

$exporter = null;
// Determine whether source is a file or a directory
// 判断需要解析的文件是目录还是单一文件
if( is_file( $path)) {
  try {
    if( $format === Exporter::GRAPHML_FORMAT)
      $exporter = new GraphMLExporter( $outfile, $startcount);
    else // either NEO4J_FORMAT or JEXP_FORMAT
      $exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
  }
  catch( IOError $e) {
    error_log( "[ERROR] ".$e->getMessage());
    exit( 1);
  }
  parse_file( $path, $exporter);
}
elseif( is_dir( $path)) {
  try {
    if( $format === Exporter::GRAPHML_FORMAT)
      $exporter = new GraphMLExporter( $outfile, $startcount);
    else // either NEO4J_FORMAT or JEXP_FORMAT
      $exporter = new CSVExporter( $format, $nodefile, $relfile, $startcount);
  }
  catch( IOError $e) {
    error_log( "[ERROR] ".$e->getMessage());
    exit( 1);
  }
  // 解析目录
  parse_dir( $path, $exporter);
}
else {
  error_log( '[ERROR] The given path is neither a regular file nor a directory.');
  exit( 1);
}

echo "Done.", PHP_EOL;

用流程图来表示为：

parser

首先调用parse_arguments()检查传入的参数是否正确。
在传入参数正常的情况下，检查需要解析的目标文件是否存在及其可读性。

然后通过下面的if-else条件句来处理单文件或是目录：

if( is_file( $path)) {
    set $exporter
    parse_file( $path, $exporter);
else if (is_dir( $path)) {
    set $exporter
    parse_dir( $path, $exporter);
} else {
    throw err;
}

执行完parse_file或是parse_dir函数之后，如果没有问题，输出Done.，程序结束。

SplFileInfo文件处理类

因为后面parse_dir()和parse_file()函数都会用到这个文件操作类，所以先看一下该类的主要作用。根据官方手册：https://www.php.net/manual/en/class.splfileinfo.php ，SplFileInfo的主要方法及其作用为：

SplFileInfo {
    /* 方法 */
    public __construct ( string $file_name )
    public getATime ( void ) : int    //获取文件的上次访问时间
    public getBasename ([ string $suffix ] ) : string    //获取文件的基本名称
    public getCTime ( void ) : int    //获取文件 inode 修改时间
    public getExtension ( void ) : string    //获取文件扩展名
    public getFileInfo ([ string $class_name ] ) : SplFileInfo    //获取文件的SplFileInfo对象
    public getFilename ( void ) : string    //获取文件名
    public getGroup ( void ) : int    //获取文件组
    public getInode ( void ) : int    //获取文件的inode
    public getLinkTarget ( void ) : string    //获取链接的目标
    public getMTime ( void ) : int    //获取上次修改时间
    public getOwner ( void ) : int    //获取文件的所有者
    public getPath ( void ) : string    //获取没有文件名的路径
    public getPathInfo ([ string $class_name ] ) : SplFileInfo    //获取路径的SplFileInfo对象
    public getPathname ( void ) : string    //获取文件的路径
    public getPerms ( void ) : int    //获取文件权限
    public getRealPath ( void ) : string    //获取文件的绝对路径
    public getSize ( void ) : int    //获取文件大小
    public getType ( void ) : string    //获取文件类型
    public isDir ( void ) : bool    //判断文件是否是目录
    public isExecutable ( void ) : bool    //判断文件是否可执行
    public isFile ( void ) : bool    //判断对象是否引用了常规文件
    public isLink ( void ) : bool    //判断文件是否为链接
    public isReadable ( void ) : bool    //判断文件是否可读
    public isWritable ( void ) : bool    //判断条目是否可写
    public openFile ([ string $open_mode = "r" [, bool $use_include_path = FALSE [, resource $context = NULL ]]] ) : SplFileObject    //获取文件的SplFileObject对象
    public setFileClass ([ string $class_name = "SplFileObject" ] ) : void    //设置与SplFileInfo :: openFile一起使用的类
    public setInfoClass ([ string $class_name = "SplFileInfo" ] ) : void    //设置与SplFileInfo :: getFileInfo和SplFileInfo :: getPathInfo一起使用的类
    public __toString ( void ) : string    //以字符串形式返回文件的路径
}

拆解分析：parse_file()

parse_file()函数会为单个php文件生成ast，它接受两个参数：

$path：需要解析的文件的路径

$exporter：$exporter决定了导出ast的format

function parse_file( $path, $exporter) : int {

  $finfo = new SplFileInfo( $path);
  echo "Parsing file ", $finfo->getPathname(), PHP_EOL;

  try {
      // ast\parse_file()是指php-ast工具，会生成一个ast树
    $ast = ast\parse_file( $path, $version = 30);

    // The above may throw a ParseError. We only export to the output
    // file(s) if that didn't happen.
      /**
       * $exporter->store_filenode的返回值是fileSystem(type=file)节点的id，store_filenode()函数的作用是存储file节点的
       * 因此$fnode 就是返回的fileSystem(type=File)节点对应的id
       */
    $fnode = $exporter->store_filenode( $finfo->getFilename());
    // 处理AST(type=AST_TOPLEVEL)节点
    $tnode = $exporter->store_toplevelnode( Exporter::TOPLEVEL_FILE, $path, 1, count(file($path)));
    // 处理ast中的其余节点
    $astroot = $exporter->export( $ast, $tnode);
    // 存储关系边
    $exporter->store_rel( $tnode, $astroot, "PARENT_OF");
    $exporter->store_rel( $fnode, $tnode, "FILE_OF");
    //echo ast_dump( $ast), PHP_EOL;
  }
  catch( ParseError $e) {
    $fnode = -1;
    error_log( "[ERROR] In $path: ".$e->getMessage());
  }

  // 返回的$fnode是一个数值，为Filesystem(type=File)节点的id
  return $fnode;
}

它的主要处理步骤是这样的：

首先利用php-ast工具中的ast\parse_file()来解析php文件，生成ast。
然后分别存储文件根节点Filesystem(type=File)和toplevel节点AST(type=AST_TOPLEVEL)，因为这两个节点相较其他节点比较特殊。它们在数据库中存储的信息数量和其他节点不相同。
然后递归处理其他节点。
最后调用$exporter->store_rel存储关系边。
返回$fnode，即文件节点Filesystem(type=File)对应的id。

拆解分析：parse_dir()

parse_dir()函数接受三个参数：

$path：需要解析的文件的路径

$exporter：$exporter决定了导出ast的format

$top：用来标识是不是最外层的目录

function parse_dir( $path, $exporter, $top = true) : int {

  // save any interesting directory/file indices in the current folder
  $found = [];
  // if the current folder finds itself interesting, we will create a
  // directory node for it and return its index
    // 为最顶层目录也创建一个directory node，也是Filesystem，但是type=Directory，并且返回它对应id，一般情况下最顶层id=0
  $dirnode = $top ? $exporter->store_dirnode( basename( $path)) : -1;

  // opendir()函数打开一个目录句柄
  $dhandle = opendir( $path);

  // iterate over everything in the current folder
    /**
     * readdir()函数：返回目录中下一个文件的文件名。文件名以在文件系统中的排序返回
     * @return string|false the filename on success or false on failure.
     * 返回值：成功则返回文件名 或者在失败时返回 false
     */
    // 循环遍历目录下的内容
  while( false !== ($filename = readdir( $dhandle))) {
      /**
       * SplFileInfo:
       * The SplFileInfo class offers a high-level object oriented interface to information for an individual file.
       * SplFileInfo类是为单个文件的信息提供高级面向对象的接口，它可以用来获取文件详细信息
       */
    $finfo = new SplFileInfo( build_path( $path, $filename));

      /**
       * SplFileInfo::isFile ( void ) : bool    判断对象是否引用了常规文件
       * SplFIleInfo::isReadable ( void ) : bool    判断文件是否可读
       * SplFileInfo::getExtension ( void ) : string    获取文件扩展名
       * SplFileInfo::getPathname ( void ) : string    //获取文件的路径
       */
    if( $finfo->isFile() && $finfo->isReadable() && in_array( strtolower( $finfo->getExtension()), ['php','inc','phar']))
      // 调用parse_file()函数来解析单文件，将解析了的文件存储在$found数组中
        $found[] = parse_file( $finfo->getPathname(), $exporter);
    else if( $finfo->isDir() && $finfo->isReadable() && $filename !== '.' && $filename !== '..')
        // 处理多层目录的情况，递归调用parse_dir()
      if( -1 !== ($childdir = parse_dir( $finfo->getPathname(), $exporter, false)))
        $found[] = $childdir;
  }

  // if the current folder finds itself interesting...
  if( !empty( $found)) {
    if( !$top)
        // 如果不是最顶层（外层）目录，另外分配id
      $dirnode = $exporter->store_dirnode( basename( $path));
    foreach( $found as $i => $nodeindex)
      $exporter->store_rel( $dirnode, $nodeindex, "DIRECTORY_OF");
  }

  closedir( $dhandle);

    /**
     * 最后的返回结果是，当前遍历的目录，比如有目录a/b/，当前遍历深度在目录b/，那么返回的$dirnode节点id也就是目录b/对应的Filesystem(type=Directory)节点id
     * 但是递归调用结束后，返回的最外层目录的id肯定为0
     */
  return $dirnode;

用流程图来表示为parse_dir()的处理逻辑为（随便画的，一点儿也不规范…）：

parse_dir

下面这行代码会创建一个directory节点，返回值是int类型的数字来唯一标识这个directory节点：

1	$dirnode = $top ? $exporter->store_dirnode( basename( $path)) : -1;

下图中的中心节点就是整个CPG图的最顶点节点，类型是Filesystem，它有三个子节点，这三个子节点分别对应它目录下的文件：

src/Exporter.php

Exporter类是CSVExporter类和GraphMLExporter类的父类。在phpstorm中查看它们的继承关系：

类常量和类变量

在src/Exporter.php中定义了一些常量，主要是和导出格式或者节点的属性相关的：

/** Constant for Neo4J format (to be used with neo4j-import) */
const NEO4J_FORMAT = 0;
/** Constant for jexp format (to be used with batch-import) */
const JEXP_FORMAT = 1;
/** Constant for GraphML format (supported by many tools) */
const GRAPHML_FORMAT = 2;

/** Labels */
const LABEL_FS = "Filesystem";
const LABEL_AST = "AST";
const LABEL_ART = "Artificial";

/** Type of directory nodes */
const DIR = "Directory";
/** Type of file nodes */
const FILE = "File";
/** Type of toplevel nodes */
const TOPLEVEL = "AST_TOPLEVEL";
/** Flags for toplevel nodes */
const TOPLEVEL_FILE = "TOPLEVEL_FILE";
const TOPLEVEL_CLASS = "TOPLEVEL_CLASS";

/** Type of entry and exit nodes (for CFG construction) */
const FUNC_ENTRY = "CFG_FUNC_ENTRY";
const FUNC_EXIT = "CFG_FUNC_EXIT";

/** Delimiter for arrays, used by format_flags() */
protected $array_delim = ";";

还有一个变量$nodecount作为id计数器：

1 2	/** Node counter */ protected $nodecount = 0;

store_filenode()

store_filenode()函数接受一个文件名$filename作为参数，它的作用是存储一个file 节点，调用的是同类的store_node()方法，该方法会返回一个id(int类型的返回值)，这个id将用来唯一标识该file 节点：

public function store_filenode( $filename) : int {

    return $this->store_node( self::LABEL_FS, self::FILE, null, null, null, null, null, null, null, null, $this->quote_and_escape( $filename), null);
}

store_toplevelnode()

在处理完toplevel节点之后，还会把两个比较特殊的节点，cfg图的entry节点和exit节点也存进结果中。因为这两个节点并不出现在php-ast解析的ast树中，是在store_toplevelnode中额外添加上的。然后还要调用store_rel()处理关系边：

public function store_toplevelnode( $flag, $name, $lineno, $endlineno, $childnum = null, $funcid = null, $namespace = null) : int {

    // 先处理toplevel节点
    $tnode = $this->store_node( self::LABEL_AST, self::TOPLEVEL, $flag, $lineno, null, $childnum, $funcid, null, $this->quote_and_escape( $namespace), $endlineno, $this->quote_and_escape( $name), null);

    // For toplevel nodes, we create artificial entry and exit nodes (like file and dir nodes,
    // they are not actually part of the AST).
    // For the entry and exit nodes, we only set
    // (1) the funcid (to the id of the toplevel node), and
    // (2) the name (to that of the file or class)
    /**
       * 在处理了toplevel节点（toplevel节点是AST类型的）之后，还会处理两个artificial节点，分别是entry node和exit node
       * 一个Artificial节点存储的信息是这样的： name: case1_optimize/1.php      id: 4     type: CFG_FUNC_EXIT     funcid: 2
       * type分别为self::FUNC_ENTRY和self::EXIT
       * name是文件名
       * funcid比较特殊，是其toplevel节点的id
       */
    $entrynode = $this->store_node( self::LABEL_ART, self::FUNC_ENTRY, null, null, null, null, $tnode, null, null, null, $this->quote_and_escape( $name), null);
    $exitnode = $this->store_node( self::LABEL_ART, self::FUNC_EXIT, null, null, null, null, $tnode, null, null, null, $this->quote_and_escape( $name), null);
    // 将toplevel节点 和entry节点 的关系边连起来，关系是ENTRY
    $this->store_rel( $tnode, $entrynode, "ENTRY");
    // 将toplevel节点 和exit节点 的关系边连起来，关系是EXIT
    $this->store_rel( $tnode, $exitnode, "EXIT");

    return $tnode;
}

store_node()

在类Exporter中，store_node()函数是一个抽象函数。

abstract protected function store_node( $label, $type, $flags, $lineno, $code = null, $childnum = null, $funcid = null, $classname = null, $namespace = null, $endlineno = null, $name = null, $doccomment = null, $fileid = null) : int;

在src/CSVExporter.php中，store_node()具体实现为：

protected function store_node( $label, $type, $flags, $lineno, $code = null, $childnum = null, $funcid = null, $classname = null, $namespace = null, $endlineno = null, $name = null, $doccomment = null) : int {

    fwrite( $this->nhandle, "{$this->nodecount}{$this->csv_delim}{$label}{$this->csv_delim}{$type}{$this->csv_delim}{$flags}{$this->csv_delim}{$lineno}{$this->csv_delim}{$code}{$this->csv_delim}{$childnum}{$this->csv_delim}{$funcid}{$this->csv_delim}{$classname}{$this->csv_delim}{$namespace}{$this->csv_delim}{$endlineno}{$this->csv_delim}{$name}{$this->csv_delim}{$doccomment}\n");

    // return the current node index, *then* increment it
    // $this->nodecount 记录当前的id，id是可以唯一标记一个节点的
    return $this->nodecount++;
}

它实现了父类Exporter中的抽象方法store_node()，并且会将需要的信息写入csv文件中。同时，需要将id计数器nodecount增加一个单位，并将其作为函数返回值返回。

store_rel()

同样的，Exporter::store_rel()也是一个抽象函数，在GraphMLExporter类和CSVExporter类中有不同的定义。

1	abstract public function store_rel( $start, $end, $type);

CSVExporter::store_rel()函数实现为：

public function store_rel( $start, $end, $type) {

    fwrite( $this->rhandle, "{$start}{$this->csv_delim}{$end}{$this->csv_delim}{$type}\n");
}

会存储节点之间的关系，参数$type就是relationship，比如PARENT_OF, ENTRY, EXIT。

export()

phpjoern处理ast树的file节点和toplevel节点是分别在store_filenode()和store_toplevel()函数中完成的，ast树的其余部分主体则是在export()中处理的：

public function export( $ast, $funcid, $nodeline = 0, $childname = "", $childnum = 0, $namespace = "", $uses = [], $classname = "") : int {

    // (1) if $ast is an AST node, print info and recurse
    // An instance of ast\Node declares:
    // $kind (integer, name can be retrieved using ast\get_kind_name())
    // $flags (integer, corresponding to a set of flags for the current node)
    // $lineno (integer, starting line number)
    // $children (array of child nodes)
    // Additionally, an instance of the subclass ast\Node\Decl declares:
    // $endLineno (integer, end line number of the declaration)
    // $name (string, the name of the declared function/class)
    // $docComment (string, the preceding doc comment)
    // 首先处理最普遍的情况：当前节点是AST类型的
    if( $ast instanceof ast\Node) {

        /**
         * function get_kind_name(int $kind): string {}
         * @param int $kind AST_* constant value defining the kind of an AST node
         * @return string String representation of AST kind value
         * 该函数会根据$kind值返回一个string, 该string对应特定的type，即$kind是和type对应的
         */
        $nodetype = ast\get_kind_name( $ast->kind);
        $nodeline = $ast->lineno;

        $nodeflags = "";
        /**
         * function kind_uses_flags(int $kind): bool {}
         * @param int $kind AST_* constant value defining the kind of an AST node
         * @return bool Returns true if AST kind uses flags
         * 判定某个AST_*类型是否有flags标志位，比如AST_NAME就有flags标志位，如NAME_NOT_FQ, NAME_FQ
         */
        if( ast\kind_uses_flags( $ast->kind)) {
            $nodeflags = $this->format_flags( $ast->kind, $ast->flags);
        }

        // for decl nodes:
        if( isset( $ast->endLineno)) {
            $nodeendline = $ast->endLineno;
        }
        if( isset( $ast->name)) {
            $nodename = $ast->name;
        }
        if( isset( $ast->docComment)) {
            $nodedoccomment = $this->quote_and_escape( $ast->docComment);
        }

        // store node, export all children and store the relationships
        $rootnode = $this->store_node( self::LABEL_AST, $nodetype, $nodeflags, $nodeline, null, $childnum, $funcid, $classname, $this->quote_and_escape( $namespace), $nodeendline, $nodename, $nodedoccomment);

        // If this node is a function/method/closure declaration, set $funcid.
        // Note that in particular, the decl node *itself* does not have $funcid set to its own id;
        // this is intentional. The *declaration* of a function/method/closure itself is part of the
        // control flow of the outer scope: e.g., a closure declaration is part of the control flow
        // of the function it is declared in, or a function/method declaration is part of the control flow
        // of the pseudo-function representing the top-level code it is declared in.
        // Note: we do not need to do this for TOPLEVEL types (and it wouldn't be straightforward since we
        // do not generate ast\Node objects for them). Rather, for toplevel nodes under files, the funcid is
        // set by the Parser class, which also stores the File node; and for toplevel nodes under classes,
        // we do it below, while iterating over the children.
        // Also, we create artificial entry and exit nodes for the CFG of the function (like file and dir nodes,
        // they are not actually part of the AST).
        // For the entry and exit nodes, we only set
        // (1) the funcid (to the id of the function node), and
        // (2) the name (to that of the function)
        if( $ast->kind === ast\AST_FUNC_DECL || $ast->kind === ast\AST_METHOD || $ast->kind === ast\AST_CLOSURE) {
            $funcid = $rootnode;
            $entrynode = $this->store_node( self::LABEL_ART, self::FUNC_ENTRY, null, null, null, null, $rootnode, $classname, $this->quote_and_escape( $namespace), null, $this->quote_and_escape( $nodename), null);
            $exitnode = $this->store_node( self::LABEL_ART, self::FUNC_EXIT, null, null, null, null, $rootnode, $classname, $this->quote_and_escape( $namespace), null, $this->quote_and_escape( $nodename), null);
            $this->store_rel( $rootnode, $entrynode, "ENTRY");
            $this->store_rel( $rootnode, $exitnode, "EXIT");
        }

        // If this node is a class declaration, set $classname
        // 如果当前节点是class声明，那么记录$classname
        if( $ast->kind === ast\AST_CLASS) {
            $classname = $nodename;
        }

        // iterate over the children and count them
        // 开始递归遍历子节点
        $i = 0;
        foreach( $ast->children as $childrel => $child) {

            // If we encounter a child node that is a namespace node, set the namespace for subtrees and upcoming sister nodes
            // Note that we do not care whether the non-bracketed syntax (second child of AST_NAMESPACE is null)
            // or the bracketed syntax (second child of AST_NAMESPACE is a statement) was used:
            // (1) if non-bracketed, the namespace must be set for all upcoming sister nodes until we encounter
            //     the next AST_NAMESPACE
            // (2) if bracketed, the namespace in principle only holds for the subtree rooted in the second child
            //     of AST_NAMESPACE (and should be set only for that subtree, but not for upcoming sister nodes);
            //     however, in this case the next sister node (if it exists) *must* be another
            //     AST_NAMESPACE node, according to PHP syntax (otherwise, a 'No code may exist outside of namespace {}'
            //     fatal error would be thrown at runtime.) Hence, if the next sister node is an AST_NAMESPACE anyway,
            //     the namespace will be set to something new once we finished off the subtree rooted in the
            //     second child of the AST_NAMESPACE we encountered.
            if( $child->kind === ast\AST_NAMESPACE) {
                $namespace = $child->children["name"] ?? "";
                $uses = []; // any namespace statement cancels all uses currently in effect
            }

            // If we encounter a child node that is a use node, add the translation rules specified by it
            // to the translation rules currently in effect
            if( $child->kind === ast\AST_USE) {
                $uses = array_merge( $uses, $this->getTranslationRulesForUse( $child));
            }

            // for the "stmts" child of an AST_CLASS, which is an AST_STMT_LIST,
            // we insert an artificial toplevel function node
            if( $ast->kind === ast\AST_CLASS && $childrel === "stmts") {
                $tnode = $this->store_toplevelnode( Exporter::TOPLEVEL_CLASS, $nodename, $nodeline, $nodeendline, $i, $funcid, $namespace);
                // when exporting the AST_STMT_LIST below the AST_CLASS, the
                // funcid is set to the toplevel node's id, childname is set to "stmts" (doesn't really matter, we can invent a name here), and childnum is set to 0
                // 递归遍历子节点
                $childnode = $this->export( $child, $tnode, $nodeline, "stmts", 0, $namespace, $uses, $classname);
                $this->store_rel( $tnode, $childnode, "PARENT_OF"); // AST_TOPLEVEL -> AST_STMT_LIST
                $this->store_rel( $rootnode, $tnode, "PARENT_OF"); // AST_CLASS -> AST_TOPLEVEL
            }
            // for the child of an AST_NAME node which is *not* fully qualified, we apply the translation rules currently in effect
            elseif( $ast->kind === ast\AST_NAME && $childrel === "name" && $ast->flags !== ast\flags\NAME_FQ) {
                $child = $this->applyTranslationRulesForName( $child, $uses);
                // 递归遍历子节点
                $childnode = $this->export( $child, $funcid, $nodeline, $childrel, $i, $namespace, $uses, $classname);
                $this->store_rel( $rootnode, $childnode, "PARENT_OF");
            }
            // in all other cases, we simply recurse straightforwardly
            else {
                // 递归遍历子节点
                $childnode = $this->export( $child, $funcid, $nodeline, $childrel, $i, $namespace, $uses, $classname);
                $this->store_rel( $rootnode, $childnode, "PARENT_OF");
            }

            // next child...
            $i++;
        }
    }

export()函数最外层有4个分支来处理不同的ast节点：

if( $ast instanceof ast\Node) {
    // 第一种情况是最常见的，节点为ast\Node类型
    // 在这一层，是唯一可能有child ast\Node节点的，所以在这一层还会对子节点进行递归遍历
} 
// 接下去都是处理非AST节点，如果不是一个AST节点，还有其他可能，比如一种是string类型的字符串，另一种是NULL类型的节点，还有一种可能是integer, double等之类的数值
else if( is_string( $ast)) {
    // 处理ast为字符串类型的节点
} else if( $ast === null) {
	// 处理type=NULL的ast节点
} else { 
    // 如果当前value既不是字符串string，也不是NULL类型，那么就将其转换为string类型
    // 开发者一开始认为该分支对应的value可能是布尔值，integers，floats或doubles，arrays，objects或是resources
    // 但是经过测试发现，该分支对应的value仅有integer和floats或doubles
}

在export()中，遇到AST_USE节点的时候还用到了getTranslationRulesForUse()来额外处理；遇到AST_NAME节点的时候使用applyTranslationRulesForName()来特殊处理。

getTranslationRulesForUse()

要了解getTranslationRulesForUse()函数和applyTranslationRulesForName()函数的作用，我们用一个testcase来说明：

// use.php
<?php

namespace com\rsumilang\util;
use com\rsumlang\common as Common;

class String extends Common\Object
{
   // ... code ...
}

然后再用php-ast解析use.php：

<?php
require '../util.php';
$code = <<<'EOC'
<?php
namespace com\rsumilang\util;
use com\rsumlang\common as Common;

class String extends Common\Object
{
   // ... code ...
}
EOC;

var_dump(ast\parse_code($code, $version=30));

// OUTPUT:
class ast\Node#1 (4) {
  public $kind =>
  int(133)
  public $flags =>
  int(0)
  public $lineno =>
  int(1)
  public $children =>
  array(3) {
    [0] =>
    class ast\Node#2 (4) {
      public $kind =>
      int(541)
      public $flags =>
      int(0)
      public $lineno =>
      int(2)
      public $children =>
      array(2) {
        'name' =>
        string(18) "com\rsumilang\util"
        'stmts' =>
        NULL
      }
    }
    [1] =>
    class ast\Node#3 (4) {
      public $kind =>
      int(144)
      public $flags =>
      int(361)
      public $lineno =>
      int(3)
      public $children =>
      array(1) {
        [0] =>
        class ast\Node#4 (4) {
          public $kind =>
          int(542)
          public $flags =>
          int(0)
          public $lineno =>
          int(3)
          public $children =>
          array(2) {
            'name' =>
            string(19) "com\rsumlang\common"
            'alias' =>
            string(6) "Common"
          }
        }
      }
    }
    [2] =>
    class ast\Node\Decl#5 (7) {
      public $kind =>
      int(69)
      public $flags =>
      int(0)
      public $lineno =>
      int(5)
      public $children =>
      array(3) {
        'extends' =>
        class ast\Node#6 (4) {
          public $kind =>
          int(2048)
          public $flags =>
          int(1)
          public $lineno =>
          int(5)
          public $children =>
          array(1) {
            'name' =>
            string(13) "Common\Object"
          }
        }
        'implements' =>
        NULL
        'stmts' =>
        class ast\Node#7 (4) {
          public $kind =>
          int(133)
          public $flags =>
          int(0)
          public $lineno =>
          int(6)
          public $children =>
          array(0) {
          }
        }
      }
      public $endLineno =>
      int(8)
      public $name =>
      string(6) "String"
      public $docComment =>
      NULL
    }
  }
}

testcase中使用use关键字来导入namespace：com\rsumlang\common，并且用别名Common来表示该命名空间。命名空间Common\Object实际上是com\rsumlang\common\Object。

private function getTranslationRulesForUse( $astuse) : array {

    if( !($astuse instanceof ast\Node) || ($astuse->kind !== ast\AST_USE))
        throw new Exception("Illegal argument to getTranslationRulesForUse(): " . var_export($astuse, true));

    $uses = [];

    foreach( $astuse->children as $astuseelem) {
        $actual = $astuseelem->children["name"];
        // if no alias is given, the default one is the last part of the actual namespace
        /**
         * strrpos ( string $haystack , string $needle , int $offset = 0 ) : int
         * 计算指定字符串在目标字符串中最后一次出现的位置
         */
        // 如果use关键字没有使用 别名alias，那么就令actual namespace最后一个`\`后面的部分作为别名
        // $uses[] 在后面的applyTranslationRulesForName()函数会用到
        $alias = $astuseelem->children["alias"] ?? substr( $actual, strrpos( $actual, "\\") + 1);
        $uses[$alias] = $actual;
    }

    return $uses;
}

通过getTranslationRulesForUse()函数，就能够将命名空间的别名保存在$uses[]数组中：

applyTranslationRulesForName()

然后再遍历到其他节点的时候，通过applyTranslationRulesForName()函数还原出原始的namespace。

private function applyTranslationRulesForName( $haystack, $uses) : string {

    if( !is_string( $haystack))
        throw new Exception("Illegal argument to applyTranslationRulesForName(): " . var_export($haystack, true));

    foreach( $uses as $needle => $replacement) {
        $needle .= "\\";
        $replacement .= "\\";
        // crude imitation of startsWith( $haystack, $needle)
        if( substr( $haystack, 0, strlen( $needle)) === $needle)
            return $replacement . substr( $haystack, strlen( $needle));
    }

    return $haystack;
}

这样就能拿到正确完整的命名空间。

php-ast中相关函数

ast\get_kind_name()

https://github.com/nikic/php-ast/blob/b8fa288b4922fe923236a198e0fb17e3441a888b/ast.stub.php#L31 ：

/**
 * @param int $kind AST_* constant value defining the kind of an AST node
 * @return string String representation of AST kind value
 */
function get_kind_name(int $kind): string {}

该函数会根据$kind值返回一个string, 该string对应特定的type，即$kind是和type (AST_*) 对应的。

来自https://github.com/nikic/php-ast/blob/b09570df0098baba601a89e65348d0b04c351d69/ast_stub.php#L8 定义了不同的AST_* 与$kind值进行对照：

// AST KIND CONSTANTS
namespace ast;
const AST_ARG_LIST = 128;
const AST_LIST = 255;
const AST_ARRAY = 129;
const AST_ENCAPS_LIST = 130;
const AST_EXPR_LIST = 131;
const AST_STMT_LIST = 132;
const AST_IF = 133;
const AST_SWITCH_LIST = 134;
const AST_CATCH_LIST = 135;
const AST_PARAM_LIST = 136;
const AST_CLOSURE_USES = 137;
const AST_PROP_DECL = 138;
const AST_CONST_DECL = 139;
const AST_CLASS_CONST_DECL = 140;
const AST_NAME_LIST = 141;
const AST_TRAIT_ADAPTATIONS = 142;
const AST_USE = 143;
const AST_TYPE_UNION = 144;
const AST_ATTRIBUTE_LIST = 145;
const AST_ATTRIBUTE_GROUP = 146;
const AST_MATCH_ARM_LIST = 147;
const AST_NAME = 2048;
const AST_CLOSURE_VAR = 2049;
const AST_NULLABLE_TYPE = 2050;
const AST_FUNC_DECL = 67;
const AST_CLOSURE = 68;
const AST_METHOD = 69;
const AST_ARROW_FUNC = 71;
const AST_CLASS = 70;
const AST_MAGIC_CONST = 0;
const AST_TYPE = 1;
const AST_VAR = 256;
const AST_CONST = 257;
const AST_UNPACK = 258;
const AST_CAST = 261;
const AST_EMPTY = 262;
const AST_ISSET = 263;
const AST_SHELL_EXEC = 265;
const AST_CLONE = 266;
const AST_EXIT = 267;
const AST_PRINT = 268;
const AST_INCLUDE_OR_EVAL = 269;
const AST_UNARY_OP = 270;
const AST_PRE_INC = 271;
const AST_PRE_DEC = 272;
const AST_POST_INC = 273;
const AST_POST_DEC = 274;
const AST_YIELD_FROM = 275;
const AST_GLOBAL = 277;
const AST_UNSET = 278;
const AST_RETURN = 279;
const AST_LABEL = 280;
const AST_REF = 281;
const AST_HALT_COMPILER = 282;
const AST_ECHO = 283;
const AST_THROW = 284;
const AST_GOTO = 285;
const AST_BREAK = 286;
const AST_CONTINUE = 287;
const AST_CLASS_NAME = 276;
const AST_CLASS_CONST_GROUP = 546;
const AST_DIM = 512;
const AST_PROP = 513;
const AST_NULLSAFE_PROP = 514;
const AST_STATIC_PROP = 515;
const AST_CALL = 516;
const AST_CLASS_CONST = 517;
const AST_ASSIGN = 518;
const AST_ASSIGN_REF = 519;
const AST_ASSIGN_OP = 520;
const AST_BINARY_OP = 521;
const AST_ARRAY_ELEM = 526;
const AST_NEW = 527;
const AST_INSTANCEOF = 528;
const AST_YIELD = 529;
const AST_STATIC = 532;
const AST_WHILE = 533;
const AST_DO_WHILE = 534;
const AST_IF_ELEM = 535;
const AST_SWITCH = 536;
const AST_SWITCH_CASE = 537;
const AST_DECLARE = 538;
const AST_PROP_ELEM = 775;
const AST_PROP_GROUP = 774;
const AST_CONST_ELEM = 776;
const AST_USE_TRAIT = 539;
const AST_TRAIT_PRECEDENCE = 540;
const AST_METHOD_REFERENCE = 541;
const AST_NAMESPACE = 542;
const AST_USE_ELEM = 543;
const AST_TRAIT_ALIAS = 544;
const AST_GROUP_USE = 545;
const AST_ATTRIBUTE = 547;
const AST_MATCH = 548;
const AST_MATCH_ARM = 549;
const AST_NAMED_ARG = 550;
const AST_ENUM_CASE = 777;
const AST_METHOD_CALL = 768;
const AST_NULLSAFE_METHOD_CALL = 769;
const AST_STATIC_CALL = 770;
const AST_CONDITIONAL = 771;
const AST_TRY = 772;
const AST_CATCH = 773;
const AST_FOR = 1024;
const AST_FOREACH = 1025;
const AST_PARAM = 1280;
// END AST KIND CONSTANTS

ast\kind_uses_flags()

https://github.com/nikic/php-ast/blob/b8fa288b4922fe923236a198e0fb17e3441a888b/ast.stub.php#L37 ：

/**
 * @param int $kind AST_* constant value defining the kind of an AST node
 * @return bool Returns true if AST kind uses flags
 */
function kind_uses_flags(int $kind): bool {}

判定某个AST_*类型是否有flags标志位，比如AST_NAME就有flags标志位，如NAME_NOT_FQ, NAME_FQ

src/util.php

src/util.php文件是从php-ast项目中复制回来的文件，主要是用来处理flags标志位的。

1
2
3

function format_flags(int $kind, int $flags) : string {}

function get_flag_info() : array {}

后记

phpjoern部分的源码比较简单，只要搭个调试环境，跟踪一下就可以理清程序的主要逻辑。我跟踪调试的文件比较少，可能覆盖的范围还不太足够，还可以去 https://github.com/nikic/php-ast/tree/master/tests 找测试用例。

所以接下去，我需要对phpjoern进行一个小的修改，仿照funcid，为每个节点记录其对应的文件id，记为fileid，这我将记录在下一篇文章中。