background="/images/LogAlg_bg.gif" TEXT="000000"
FONTSIZE="14pt" FONT="helvetica"
| Computational Logic |
| Abstract Interpretation of Logic Programs |
[Material partly from Cousot, Nielson, Gallagher,
Sondergaard, Bruynooghe, and others]
- Many CS problems related to program analysis / synthesis
- Prove that some property holds for program
(program analysis)
- Alternatively: derive properties which do hold for program
(program analysis)
- Given a program
, generate a program
which is
- in some way equivalent to
- behaves better than
w.r.t. some criteria
(program analysis / synthesis)
- Standard Approach:
- identify that some invariant holds, and
- specialize the program for the particular case
- Frequent in compilers although seldom treated in a formal way:
- ``code optimization'',
- ``dead code elimination'',
- ``code motion'',
- ...
[Aho, Ullman 77]
- Often referred to as ``dataflow analysis''
- Abstract interpretation provides a formal framework
for developing program analysis tools
- Analysis phase + synthesis phase
Abstract Interpretation + Program Transformation
- Consider detecting that one branch will not be taken in:
if
then
else
- Exhaustive analysis in the standard domain: non-termination
- Human reasoning about programs - uses abstractions or
approximations:
signs, order of magnitude, odd/even, ...
- Basic Idea: use approximate (generally finite)
representations of computational objects to make the problem of
program dataflow analysis tractable
- Abstract interpretation is a formalization of this idea:
- define a non-standard semantics which can approximate the meaning
or behaviour of the program in a finite way
- expressions are computed over an approximate (abstract) domain
rather than the concrete domain (i.e., meaning of operators has to
be reconsidered w.r.t. this new domain)
- Very general:
can be applied to any language with well defined
(procedural or declarative) semantics
- Automatic - (vs. proof methods)
- Static - not all possible runs actually tried (vs. model checking)
- Sound - no possible run omitted (vs. debugging)
- Consider the domain
(integers)
- and the multiplication operator:
- We define an ``abstract domain'':
- Abstract multiplication:
defined by
- This allows us to reason, for example, that
is
never negative
- Some observations:
- Again,
(integers)
- and:
- Let's define a more refined ``abstract domain'':
- Abstract multiplication:
defined by
- This now allows us to reason that
is zero
- Some observations:
- There is a degree of freedom in defining different abstract
operators and domains
- The minimal requirement is that they be ``safe'' or ``correct''
- Different ``safe'' definitions result in different kinds of analyses
- Again
(integers)
- and the addition operator:
- We cannot use
because we wouldn't
know how to represent the result of
(i.e. our abstract addition would not be closed)
- New element ``
'' (supremum): approximation of any integer
- New ``abstract domain'':
- Abstract addition:
defined by:
... (alt:
)
- We can now reason that
is never negative
- In addition to the imprecision due to the
coarseness of
, the abstract versions of the
operations (dependent on
may
introduce further imprecision
- Thus, the choice of abstract domain and the definition of
the abstract operators are crucial
- Required:
- Correctness - safe approximations: because most ``interesting''
properties are undecidable the analysis necessarily has to be
approximate. We want to ensure that the analysis is ``conservative''
and errs on the ``safe side''
- Termination - compilation should definitely terminate
(note: not always the case in every day program analysis tools!)
- Desirable - ``practicality'':
- Efficiency - in practice finite analysis time is not enough:
finite and small
- Accuracy - of the collected information: depends on the
appropriateness of the abstract domain and the level of detail to
which the interpretation procedure mimics the semantics of the
language
- ``Usefulness'' - determines which information is worth
collecting
- The first two received the most attention initially
(understandably)
- Last three recently studied empirically
(e.g., for logic programs)
- Basic idea in approximation:
for some property
we want to show that
Alternative: construct a set
, and prove
then,
is a safe approximation of
- Approximation on functions:
for some property
we want to show that
- A function
is a safe approximation of
if
- Let the meaning of a program
be a mapping
from input
to output, input and output values
``standard'' domain
:
- Let's `lift' this meaning to map sets of inputs to sets of
outputs
where
denotes the powerset of S, and
- A function
is a safe approximation of
if
- Properties can be proved using
instead of
- For some property
we want to show that
for some inputs

,

- We show that
for some inputs

,

- Since
for some inputs

,

(Note: abuse of notation -
does not work on abstract values
)
- As long as
is monotonic:
- And since
, then:
for some inputs

,

- We can now define an abstract meaning function as
which is then safe if
- We can then prove a property of the output of a given class of
inputs represented by
by proving that all elements of
have such property
- E.g. in our example, a property such as ``if this program takes
a positive number it will produce a negative number as output''
can be proved
- Generating
:
- ``If this program takes a positive number it will produce a
negative number as output''
- ``Input-output'' semantics often too coarse for useful analysis:
information about ``state'' at program points generally
required
``extended semantics''
- Program points can be reached many times, from different points,
and in different ``states''
``collecting'' (``sticky'') semantics
- Analysis often computes a collection of abstract states for a
program point
- Often more efficient to ``summarize'' states into one which
gives the best overall description
lattice structure
in abstract domain
- The ordering on
,
, induces an
ordering on
,
(``approximates better'')
E.g., we can choose either
or
,
but
and
, and
since
we have
,
i.e.,
approximates better than
,
it is more precise
- It is generally required that
be a
complete lattice
- Therefore, for all
there exists a unique
least upper bound
-i.e., such that
- Intuition: given a set of approximations of the ``current
state'' at a given point in a program, to ensure that it is the best
``overall'' description for the point:
approximates everything the elements of
approximate
is the best approximation in
- We consider
- We add
(infimum) so that
exists and to have a complete lattice:
- (Intuition:
it represents a program point that is never reached)
- The concretization function has to be extended with
- The lattice is then given by:
-
- To make
more meaningful we
consider
- The lattice is then given by:
?
-
accurately represents a program point where a variable can be
negative or zero
- Showing monotonicity of
may be more difficult than
showing that
meets the finiteness conditions
- There may be an
which terminates even if the
conditions are not met
- Conditions also be relaxed by restricting the class of programs
(e.g. non-recursive programs pose few difficulties, although they
are hardly interesting)
- In some cases an approximation from above (
) can
also be interesting
- There are other alternatives to finiteness: dynamic bounded
depth, etc.
(See: Widening and Narrowing)
- The idea itself (i.e. rule of signs) predates computation...
- The idea of computing by approximations was used as early as
1963 by Naur
(``pseudo evaluation'', in the Gier Algol compiler),
``a process which combines the operators and operands of the
source text in the manner in which an actual evaluation would have to
do it, but which operates on descriptions of the operands, not on
their values''
- 1972, Sintzoff (proving well-formedness and termination properties)
- 1975, Wegbreit appears to be the first to develop a lattice-theoretic
model
- Mid 70's: Kam, Kindall, Tarjan, Ullman, ...
- 1976,77, Patrick and Radhia Cousot proposed a formal model for the
analysis of imperative (``flowchart'') languages: unifying framework
- Define a ``static'' semantics: associate a set of possible
storage states with each program point
- Dataflow analysis constructed then as a finitely computable
approximation to the static semantics
- Which semantics?
- Declarative semantics: concerned with what is a consequence of
the program
- Model-theoretic semantics
- Fixpoint (
operator-based) semantics
can be what the program actually does (cf. database-style bottom-up
evaluation)
- Operational semantics: close to the behavior of the program
- SLD-resolution based (success sets)
- Denotational
- Can cover possibilities other that SLD: reactive, parallel, ...
- Analyses based on declarative semantics are often called
``bottom up'' analyses
- Analysis based on the (top-down) operational semantics are often
called ``top-down'' analyses
- Also, intermediate cases (generally achieved through program
transformation)
- Example:
}
all subsets of
Such ``bottom-up'' analyses have been proposed for example by
Marriott and Sondergaard, and, more recently, by Codish, Dams, and
Yardeni, Debray and Ramakrishnan, Barbuti, Giacobazzi, and Levi, and
others.
- Advantages:
- Simple and elegant. Based on the declarative, fixpoint
semantics
- General: results independent of the query form
- Disadvantages:
- Information only about ``procedure exit.'' Normally information
needed at various program points in compilation, e.g., ``call
patterns'' (closures)
- The ``logical variable'' not observed (uses ground data).
Information on instantiation state, substitutions, etc. often
needed in compilation
- Not query-directed: analyzes whole program, not the part (and
modes) that correspond to ``normal'' use (expressed through a query
form)
- Solutions:
- Call patterns obtainable via ``magic sets'' transformation
[Marriott and Sondergaard]
Used also for query-directed analysis by [Barbuti et al.], [Codish et al.],
[Gallagher et al.], [Ramakrishnan et al.], and others
- Enhanced fixpoint semantics
(e.g, S-semantics [Falaschi et al.], [Gaifman and Shapiro])
- Define an extended (collecting) concrete semantics, derived from
SLD resolution,
making relevant information observable.
- Abstract domain: generally ``abstract substitutions''.
- Abstract operations: unification, composition,
projection, extension, ...
- Abstract semantic function: takes a query form
(abstraction of initial goal or set of initial goals) and the
program and returns abstract descriptions of the substitutions at
relevant program points.
- Variables complicate things:
- correctness (due to aliasing),
- termination (merging information related to different
renamings of a variable)
- Logic variables are in fact (well behaved) pointers:
X = tree(N,L,R), L = nil, Y = N, Y = 3, ...
this makes analysis of logic programs very interesting
(and quite relevant to other paradigms).
- Simple domains [Mellish,Debray], e.g.:
{ closed (ground), don't know, empty, free,
non-var }
(e.g.
, ?,
,
,
)
- May need to be very imprecise to be correct:
:- entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :-
q(X,Y),
X = a.
q(Z,Z).
- Correct/more accurate treatment of aliasing [Debray]:
associate with a program variable a pair
abstraction of the set of
terms the variable may be bound to
set of program variables
it may ``share'' with
.
- More accurate sharing - pair sharing [Sondergaard] [Codish]:
pairs of variables denoting possible sharing.
:- entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :-
q(X,Y), % { X=f, Y=f } and { (X,Y) }
X = a. % { X=g, Y=g } and { (X,Y) }
q(Z,Z).
- Note: we have used a ``combined'' domain: simple modes plus pair
sharing
- Pair sharing can encode linearity:
:- entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :-
q(X,Y), % { X=f, Y=f } and { (X,Y) }
W = f(X,Y). % { W=nv, X=f, Y=f } and { (W,W), (X,Y) }
q(Z,Z).
- Even more accurate sharing - set sharing [Jacobs et al.]
[Muthukumar et al.]:
sets of sets of variables.
- A bit tricky to understand. Try:
- Encodes grounding and independence
has no ocurrence in any set: it is ground
has no ocurrence in any set: they are independent
- Sharing+Freeness [Muthukumar et al.] (and + depth-K)
- Type graphs [Janssens et al.]
- Depth-K [Sato and Tamaki]
- Pattern structure [Van Hentenryck et al.]
- Variable dereferencing [VanRoy] [Taylor]
- ...
- Much work by [Codish et al.] [File et al.] [Giacobazzi et al.]
... on combining and comparing these domains
- Debray: predicate level mode inference (call and success
patterns for predicates). Unification reformulated as entry + exit
unification. Termination by tabling.
- Jones, Marriott, and Sondergaard: using denotational semantics.
- Bruynooghe:
- Concrete semantics constructs ``generalized'' AND trees: nodes
contain instance of goal before and after execution: call
substitution and success substitution.
- Analysis constructs ``abstract AND-OR trees''. Each represents a
(possibly infinite) set of (possibly infinite) concrete trees.
Widening to regular trees for termination.
- Framework is generic: parametric on some basic domain related
functions + conditions for correctness and termination.
- Muthukumar and Hermenegildo: ``PLAI'' framework.
Improvement over previous frameworks:
Efficient fixpoint algorithms (dependency tracking) and
memory savings (no explicit representation of trees).
- Fixpoint required on recursive predicates only:
figure=/home/clip/Slides/nmsu_lectures/ai/Figs/fixpt.ps,bbllx=0pt,bblly=20pt,bburx=500pt,bbury=220pt,width=0.85
- Simply recursive (a)
- Mutually recursive (b)
``Use current success substitution and iterate until a fixpoint is
reached''
- Abstract tree contains several occurrences of the same atom in a
clause (for precision): useful for program specialization
( Multivariance )
However,
too many versions if not controlled
(solutions proposed [Gianotti
et al.], [Jacobs et al.], [Puebla et al.])
- Much recent work in domains, improvement of fixpoints,
application, etc. [Taylor],[VanRoy], GAIA [LeCharlier et al.]
- Abstract compilation:
Compute over and ``abstract version'' of the program
- Reexecution [Bruynooghe, LeCharlier et. al.]
(alternative to keeping track of accurate sharing)
- Caching of operations [LeCharlier et al.]
- CLP: (relation-based) programs over symbolic and non symbolic
domains: constraint satisfaction instead unification (e.g. CLP(R),
PrologIII, CHIP, etc.)
- Jorgensen, Marriott, and Michaylov [ISLP'91] and later Marriott
and Stuckey [POPL'93] identified numerous opportunities for
improvement via static analysis
- A number of proposals for analysis frameworks:
- Marriott and Sondergaard [NACLP90]:
denotational approach
- Codognet and Filé [ICPL92]:
uses constraint solving for the
analysis itself and ``abstract compilation''
- G. de la Banda and Hermenegildo [WICLP'91,ILPS'93]:
adaptation
of LP frameworks (PLAI).
- A few milestones (on the road to CLP analysis):
- 1981, Mycroft: strictness analysis of applicative languages
- 1981, Mellish: proposes application to logic programs
- 1986, Debray: framework with safe treatment of logic variables,
discussion of efficiency
- 1987, Bruynooghe: framework for LP based on and-or trees
- 1987, Jones and Sondergaard: framework based on a denotational
definition of SLD
- 1988, Warren, Debray and Hermenegildo:
and
practicality of Abs. Int. for Logic Programs shown (for program
parallelization)
- 1989, Muthukumar and Hermenegildo: PLAI generic system
- 1990, Van Roy / Taylor: application to sequential optimization
of Prolog
- 1991, Marriott et al.: first extension to CLP
- 1992, Garcia de la Banda and Hermenegildo: generalization of
Bruynooghe's algorithm to CLP, extension of PLAI
- Abstract Interpretation is a very elegant program analysis
technique
- It has in addition been proved useful and efficient. E.g., for
LP and CLP:
- Static parallelization of logic (and CLP) programs
[Hermenegildo et al]
- (Sequential) program optimization [Taylor, VanRoy, ...]
- Optimization of CLP programs [Marriott et al, ...]
- Abstract debugging, etc.
- Interesting issues studied for handling large real programs:
- Modularity
- Handling extra-logical features, higher order
- Handling dynamic code
- Support of test-debug cycle
Solutions include [See, e.g., papers in ESOP'96, SAS'96]:
- Module interface definition: modular analysis
- Analysis of ``Full Prolog''
- Incremental analysis
- Demo!
Last modification: Wed Nov 22 23:57:35 CET 2006 <webmaster@clip.dia.fi.upm.es>