Upon Matrix and Graph Representations of Factor Graphs

Motivation

Understand better the connections between matrices and graphs in previous works; Recap details.
- Factor Graphs for Robot Perception
Have a glance at advances in the intersection of graph theory and SLAM
- Reliable Graphs for SLAM, IJRR 2019
- Cramér–Rao Bounds and Optimal Design Metrics for Pose-Graph SLAM, TRO 2021

Factor Graph and its Information Matrix

We start with revisiting a factor graph and its corresponding Jacobian

Each factor corresponds to several rows. It is a measurement.
Each variable corresponds to several columns. It is a state to be estimated.

So we have a factor graph, and its corresponding linear system $Ax = b$.

Variable elimination and matrix factorization

In large sparse SLAM setups, we usually factorize A with QR. Partial QR is iteratively applied to submatrices, with permutation (?).

Factorization is variable elimination in a graph
The spirit is like ‘an undirected graph to a DAG’

Factorization is partial QR in a matrix
Use Householder reflections or Givens rotations

Variable elimination ordering

The ordering in elimination matters. A bad order results in fill-in and leads to a dense matrix to solve.

Topologically densely connected

We prefer smaller degrees in order
Degrees:
- l1=1, l2=1; x1=2; x2=3, x3=2.
Computationally dense
We prefer fewer entries in the rows during QR
- Direct order: 2 fill-in, then 1 fill-in
- Inverse order: 2 fill-in, then 3 fill-in

Heuristics for reordering

Reorder according to degrees: COLAMD

[TODO] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.1109&rep=rep1&type=pdf
Graph partitioning: METIS
A graph can be somehow partitioned into

Incremental update and the clique tree

Fact 1

Variable elimination results in not only a DAG but also a chordal graph (PGM Theorem 9.8)
- A chordal graph is a triangulated graph

Fact 2

We can form a clique graph from a graph, by connecting max-cliques

Fact 3

A clique graph is a clique tree in a chordal graph
Trees are good for elimination or factorization esp. when the cliques are small
A clique tree depicts the global structure
Each clique depicts local distributions
In a clique is a dense upper triangular matrix after reducing the separator column

Incremental update

Identify the path from the root that contains the variables
Redo the variable elimination locally to update the clique structure
- Givens rotation locally

Summary

Up to now, we know the connection between

Matrix factorization (QR decomposition, reordering rows, locally dense triangles)
and graph topology manipulation (undirected to DAG, factor to Bayes, degree, clique)

Heuristically, we know local topology (vertex degree) matters. There could be more fundamental insights.

Depict a graph in matrix

We used graphics to depict graphs. They are intuitive, but not easy to analyze.
Although they are related to Jacobian matrices (in factorization), they are not equivalent.
We now use incidence matrices and graph laplacian to directly depict matrix topology.

Incidence matrix

A V x E matrix, each column encode the (weighted) edge connecting 2 vertices
Sounds familiar? A Jacobian is E x V, each row encodes a measurement connecting 2 variables

There are even more similarities.

Laplacian matrix

Laplacian is given by $L = D - A$, where D and A are degree and adjacency matrices. It is a V x V matrix.
Graph laplacian depicts the graph’s intrinsic connectivity at the vertices’ perspective.
Another derivation is $L = MM^\top$, where $M$ is the incidence matrix.
Sounds familiar again? A Fisher Information Matrix is given by $\Lambda = J^\top J$, where $J$ is the Jacobian matrix.
We will revisit some special setups where $\Lambda$ and $L$ can be directly connected with equations.

Conectivity from Laplacian

Key message: we can obtain global topology (graph **connectivity) of the graph via the Laplacian matrix.
Measurement 1: Algebraic connectivity / Fielder value
- $\lambda_2(L)$, where $0 = \lambda_1 \le \cdots \le \lambda_n$
- This connecivity is 0 if the graph is disconnected; 1 if the graph is complete.
- For the example above the eigen values are:
  - [3.33066907e-16, 7.21586391e-01, 1.68256939e+00, 3.00000000e+00, 3.70462437e+00, 4.89121985e+00]
- Associated eigenvector:
  - [0.41486979, 0.30944167, 0.0692328 , -0.22093352, 0.22093352, -0.79354426]
Measurement 2: Tree connectivity / Kirchoff’s matrix-tree theorem
- Count the number of spanning trees
  - \[t(G) = \frac{1}{n} \lambda_2 \lambda_3 \cdots\lambda_n\]
  - For the example above: 11
  - Proof: http://math.fau.edu/locke/Graphmat.htm
- Related: Cayley’s formula $t(G) = n^{n-2}$ for a complete graph.

Reduced Laplacian

Remove one selected row (vertex) and the corresponding arbitrary column (edge)
‘Corresponding to’ a Jacobian: remove a vertex or anchor a variable
Algebraic connectivity: $\lambda_1(L^r)$
- SVD / Eigenvalue decomposition
Tree connectivity: $\det L^r$
- Relatively easy to compute, esp. for sparse matrices:
  - Use Cholesky decomposition, then compute the product of diagonal elements

Associate Laplacian with Information

The simplest $R^d$-sync problem
- We have variables $x_j \in \mathbb{R}^d$ and measurements $z_{ij} = x_i - x_j + \epsilon_{ij}$.
- Covariance per edge is simplified to $\sigma_{i}^2 \mathbf{I}_d$
- The Jacobian is the incidence matrix (transpose), if d=1 and the variance per edge is 1 and we discard priors
If we take in one prior then the Jacobian is the reduced incident matrix (transpose)
\[z = \mathbf{M}^{r, \top} x + \epsilon\]
If we extend to arbitrary dimension d then it becomes

$z = (\mathbf{M}^r \otimes \mathbf{I}_d)^\top x + \epsilon$ (Kronecker multipler expands each 1x1 entry to a dxd block)
Then considering the variance as a diagonal weight matrix, the information matrix is given by
\[\Lambda = (\mathbf{M}^r \otimes \mathbf{I}_d)^\top (\mathbf{W} \otimes \mathbf{I}_d) (\mathbf{M}^r \otimes \mathbf{I}_d) = (\mathbf{M}^r \mathbf{W} \mathbf{M}^r)^\top \otimes \mathbf{I}_d = \mathbf{L}^r_w \otimes \mathbf{I}_d\]
Here we are using the weighted Laplacian, where the weight per edge is given by the measurement variance.
This is a very close association between Information and Laplacian matrices.
\[\Lambda = \mathbf{L}^r_w \otimes \mathbf{I}_d\]

E-Optimal and Albegraic connectivity

Covariance is inverse information $\mathbf{C}^{-1} \approx \Lambda = \mathbf{L}^r_w \otimes \mathbf{I}_d$
If we apply eigenvalue decomposition to a covariance matrix C, then the max eigenvalue gives the worst case error variance (E-optimal design) among all unit directions.
\[\lambda_{max}(C) \approx 1 / \lambda_1(\Lambda) = 1 / \lambda_1(L_w^r)\]
So the albebraic connectivity defines the variance error bound

D-criterion and Tree connectivity

If we apply determinant to a covariance matrix C, then the value provides a scalar measure of the uncertainty (D-criterion) in the covariance matrix.
\[\log \det C \approx \log \det \Lambda^{-1} = - \log \det (\mathbf{L}^r_w \otimes \mathbf{I}_d) = -d \log t(G)\]
So the tree connectivity gives the uncertainty estimate

There are extensions to less trivial 2D/3D SLAM, and corresponding corollaries. In a nutshell, graph topology and estimation uncertainty is connected via Laplacian and Fisher Information.

Topology selection for a better D-criterion

With the empirical connection, we drop covariance now and consider only the topology or the Laplacian

We are interested in edge selection (e.g., loop closure selection in a pose graph)

k-ESP: Given a initial graph and several candidate edges (say potential loop closures), select at most k-edges to maximize connectivity and reduce uncertainty
$\Delta$-ESP: Select as few edges as possible to reach the desired information gain $\Delta$

We know this can be computed in a batch:

Select several edges, run Cholesky of the updated Laplacian, and measure t(G) (tree connectivity)
Exact solution by evaluating in a batches is NP-hard. There are C(m, k) combinations.

Can we do it incrementally by adding one edge $m_e$ in the incidence matrix?

Yes, the gain is given by $\Delta_e = m_e^\top L^{-1} m_e$, where $m_e$ is the additional column in the incidence matrix
So we can use the greedy algorithm: every time we add the edge with the max gain, then update the Laplacian
Other solutions include convex relaxation
\[L = L_{init} + \sum_j \pi_j L_{e_j} = MW^\pi M\]
Non-convex version
Convex relaxation (can be solved with a laplacian multiplier
In short: we know a criteria to properly select edges (loops) by (weighted) topology.

Problems

Can we generalize from the block diagonal covariance to general covariance? In other words, can we put more information to edges rather than a scalar weight?
Can we put noisy-or factors on edges to support multi-hypothesis inference?
Tree-connectivity is a global measurement. Can we limit it in a subgraph and consider properties in clique trees?
- For a clique we can use Cayley’s formula to directly obtain tree connectivity without decomposition. can this help in connectivity measurement?

Wei Dong