程序依赖图PDG

PDG定义

维基百科对PDG的定义

In computer science, a Program Dependence Graph (PDG) is a representation of a program's control and data dependencies. It's a directed graph where nodes represent program statements, and edges represent dependencies between these statements. PDGs are useful in various program analysis tasks, including optimizations, debugging, and understanding program behavior.These dependencies are used during dependence analysis in optimizing compilers to make transformations so that multiple cores are used, and parallelism is improved. Nodes and edges in a PDG may have attributes associated with them, representing variables read from or written to, or the type of dependency they represent. PDGs are used in data flow analysis, slicing, optimization, debugging, and parallelization, providing insights into how program components interact and aiding in understanding and analyzing program behavior.

解释：

在计算机科学中，程序依赖图（PDG）是程序的 控制依赖 和 数据依赖 关系的表示。它是一个有向图，其中节点表示程序语句，边表示语句之间的依赖关系。PDG 在各种程序分析任务中非常有用，包括优化、调试和理解程序行为。在优化编译器的依赖分析期间使用这些依赖关系进行转换，以便使用多个内核，并提高并行执行效率。PDG 中的节点和边可能具有与其关联的属性，表示变量读取或写入、依赖关系类型等。PDG 用于数据流分析、程序切片、程序优化、调试及并行优化，为理解程序语义提供一种直观的表示形式，有助于理解和分析程序行为。

官方的解释如上，需要注意的是，程序依赖图PDG包含两部分，控制依赖图（CDG：Control Dependency Graph ）和数据依赖图（DDG：Data Dependency Graph），分别表示程序语句间的控制依赖关系（如：对于if语句来说，then语句控制依赖于条件判断语句）和数据依赖关系（数据流图中变量间的边指向关系）。

举两个栗子

代码块1

`if (A) then  // S1       B = C * D  // S2   endif   `

S2语句执行与否取决于S1语句的判断结果，这时，我们称S2控制依赖于S1

代码块2

`A = B * C  // S1   D = A * E + 1  // S2   `

在S2语句中，对S1语句中的变量A进行了读取，这时，我们称S2数据依赖于S1

Ok，看完两个简单的例子，对控制依赖和数据依赖大概有一个具象化的了解，然后就可以看定义了，在讲数据依赖（DDG）之前，需要先了解一下定义可达性分析算法。

预备知识

定义可达性分析

定义可达性分析也叫Reaching Definitions Analysis

定义

A definition d at program point p reaches a point q if there is a path from p to q such that d is not “killed” along that path

解释：

在程序点 p 处有一个变量的定义 d，如果有一条控制流路径可以从程序点 p 到达程序点 q，并且在这条路径上定义 d 没有被 “kill” 掉（也就是定义 d 没有被重新覆盖掉），那么就称 p 处的定义 d 可以到达 q（定义 d 可达q）。

分析每个程序点处可以到达的定义的过程称为 定义可达行分析。

DDG

数据依赖图DDG（Data Dependence Graph）针对的对象是Statement（程序语句）

（参考论文 Christian Hammer 写的 Static Program Analysis Program Dependence Graph，感兴趣的同学可以直接看原文）

Data Dependence定义

Known from optimizing compilers

For slicing only “flow dependence” is relevant

Called data dependence in the sequel

x → dd y means that a node x computes a value that may be used at node y in some feasible execution

A node y is data dependent on node x (x →dd y) if

there exists a variable v with v ∈ Def(x) and v ∈ Use(y),

and ∃ path P in the CFG from x to y where the definition of v in x is not definitively killed (i.e. x is a reaching definition of y.)

通过定义我们可以知道，程序中的节点x数据依赖于节点y（针对变量v来说），需要满足以下条件：

1.在节点x处对变量v进行了定义，在节点y处对变量v进行了使用

2.从节点x到节点y，存在一条CFG路径，并且在这条CFG路径上没有对变量v重新赋值（not kill）

上文已经讲了定义可达性分析（Reaching Definitions Analysis）的原理，数据依赖图（DDG）的构建过程其实就可以理解为 对每个程序节点的定义做可达性分析的过程。

CDG

控制依赖图CDG（Control Dependence Graph）针对的对象是Statement（程序语句**）**

Control Dependence定义：

程序天然存在的控制依赖关系

One statement directly controls the execution of another

In structured programs equivalent to “indentation level”
`while(i<n=) {       sum = sum + i;       prod = prod * i;       i++;   }   write(sum);   `
Statements in the while body are control dependent on the while predicate. The write is no longer dependent on the predicate.

解释：

前后语句之间的控制依赖（这也是一种天然的控制流）

类似while缩进格式的语句块的控制依赖：while循环体控制依赖于while条件

Post-Dominators控制依赖

Control Dependence Formally

Standard definition in terms of post-dominance

A node x in the CFG is post-dominated by node y if all paths from x to exit node pass through y

A node y is control dependent on node x (x →cd y) if

∃ path p from x to y in the CFG, such that y postdominates every node in p (except for x), and

x is not post-dominated by y

解释：

通过前向支配树（FDT）确定控制依赖关系：

CFG中的节点x被节点y前向支配：从节点x到程序出口的所有路径都经过y

节点x被节点y控制依赖 (x →cd y)：

在CFG上存在一条从x到y的路径，在这条路径上，y前向支配 除x外 的所有其他节点

x不被y前向支配

小剧场（可忽略）：

这块尽力理解了，本想着自己用”人话“解释一下这个定义，但实在是太难了555，希望有懂这方面的大佬在技术交流群里解答一下，或者通过公众号与作者私聊，感谢！

测试代码

`package com.test.pdg;      import java.io.IOException;      public class TestPDG2 {       public void test(Integer n) {           read(n);           int i = 1;           int sum = 0;           int prod = 1;           while (i <= n) {               sum = sum + i;               prod = prod * i;               i++;           }           write(sum);           write(prod);       }          public Integer read(Integer n)  {           try {               n = System.in.read();           } catch (IOException e) {               throw new RuntimeException(e);           }           return n;       }          public void write(Integer n) {           System.out.println(n);       }   }   `

tips：此处暂时就只把论文中提到的结果展示出来，供大家参考。笔者尝试本地做测试，想通过直观的图形化展示生成的结果，但由于cpg框架自身的实现不完善以及本人对neo4j的Cyber命令不是很熟（555，努力进修中），所以最终呈现结果就会显得很乱，基于此，以防混淆视听，就不再做展示啦，感谢大家的理解。

（文末会展示暂时的生成效果，大家有什么更好的解决办法可以多多提哈，感谢~）

DDG结果

CDG结果

Post-dominator Tree

CDG

杂乱的cpg

Tips:测试代码与上文一致

红色边：PDG

蓝色边：DFG

绿色边：CDG

长亭百川云 - 文章详情

长亭百川云