Points-to Analysis is widely used, but it’s tricky to understand. There’s a wide variety of notations used, leading to confusion. And if that wasn’t bad enough, the most popular points-to analysis tool SVF, operates on LLVM IR, while most descriptions of points-to analysis algorithms use examples from C code. This makes it difficult for readers to map the familiar C source code based examples to the functioning of SVF.

This, and the following blog articles will describe things that I have learnt during my experiences with SVF and pointer analysis.

**Andersen’s Algorithm**

It is an iterative, subset-based points-to analysis algorithm. I’ll describe the different ways to think about the constraints and derivation rules, that I’ve found to be helpful.

- p = &q:

This is the Address-of Constraint.

Here the address of q is stored to p. So q ∈ p. Where p is the pts-to set for p. - p = q:

This is the Copy Constraint.

Here both p and q are pointers (or pointers to pointers, or pointers to pointers to pointers … and so on). This is fairly intuitive. It means, the points to set of q is a subset of the points-to set of p.

In other words, p might* point to whatever q might point to, in addition to whatever else, p might already be pointing to.

In other words, because q is a subset, we can denote p ⊇ q, denoting p is a superset of q. Some texts use this notation.

In other words, p ⊇ q, l_{x }∈ q, then, l_{x }∈ p.

In other words, the points-to set of q, is copied into the points-to set of p.

Note*: I use might because points-to analysis is imprecise and overapproximated.

Note: the alphabet is used to denote both the pointer / object as well as its points-to set. The meaning is clear depending on the context. - *p = q:

This is the Assign Constraint.

Here q is a pointer (or a pointer to a pointer or so on ..) and p is a pointer-to-a-pointer (or a pointer to a pointer to a pointer and so on …). Intuitively, it means that the points-to set of q is a subset of each of the points-to sets of every element that is in points-to set for p.

In other words, every element that p might point to, might also point to whatever q might point to, in addition to whatever they might already be pointing to.

In other words, for every element e such that, p → e, e q.

In other words, *p ⊇ q, l_{r }p, l_{x }∈ q_{ }, then, l_{x }∈ r.

In other words, the points-to set of q, is copied into every element in the points-to set of p. - p = *q:

This the Deref Constraint.

Here q is pointer-to-a-pointer (or a pointer-to-a-pointer-to-a-pointer and so on), and p is a pointer (or a pointer-to-a-pointer and so on). Intuitively it means that the points-to set of every element that q could point to, is a subset of the points-to set of p.

In other words, p might point to whatever, every element that is in the points-to set for every element that might be pointed to by q.

In other words, for every element e such that, q → e, p ⊇ e.

In other words, p ⊇ *q, l_{r }∈ q, l_{x }∈ r_{ }, then, l_{x }∈ p.

In other words, the points-to set of every element that q might point to, is copied into

The Andersen’s Algorithm, and practically all static analysis algorithms are language-independent. For example, all languages that supports pointers and allows for pointers to be assigned and dereferenced, will work with Andersen’s Analysis.

Also, note that C programs can contain more complex statements like **p = ***q, or whatever. These have to be broken down into simpler statements via temporary variables in order to do the analysis.