Jekyll2021-11-03T18:31:15+00:00https://dongwei.info/feed.xmlWei’s HomepageWei's HomepageWei DongUpon Matrix and Graph Representations of Factor Graphs2021-10-17T00:00:00+00:002021-10-17T00:00:00+00:00https://dongwei.info/blog/graph-matrix<h2 id="motivation">Motivation</h2> <ul> <li>Understand better the connections between <strong>matrices</strong> and <strong>graphs</strong> in previous works; Recap details. <ul> <li>Factor Graphs for Robot Perception</li> </ul> </li> <li>Have a glance at advances in the intersection of <strong>graph theory</strong> and <strong>SLAM</strong> <ul> <li>Reliable Graphs for SLAM, IJRR 2019</li> <li>Cramér–Rao Bounds and Optimal Design Metrics for Pose-Graph SLAM, TRO 2021</li> </ul> </li> </ul> <h2 id="factor-graph-and-its-information-matrix">Factor Graph and its Information Matrix</h2> <p><img src="/assets/images/matrix_graph/factor-graph.png" width="40%" /> <img src="/assets/images/matrix_graph/factor-graph-matrix.png" width="40%" /></p> <p>We start with revisiting a factor graph and its corresponding Jacobian</p> <ul> <li>Each <strong>factor</strong> corresponds to several rows. It is a <strong>measurement</strong>.</li> <li>Each <strong>variable</strong> corresponds to several columns. It is a <strong>state</strong> to be estimated.</li> </ul> <p>So we have a factor graph, and its corresponding linear system $$Ax = b$$.</p> <h3 id="variable-elimination-and-matrix-factorization">Variable elimination and matrix factorization</h3> <p>In large sparse SLAM setups, we usually factorize A with QR. Partial QR is iteratively applied to submatrices, with permutation (?).</p> <p><img src="/assets/images/matrix_graph/factor-graph-matrix-partial.png" width="40%" /> <img src="/assets/images/matrix_graph/factor-graph-qr-partial.png" width="32%" /></p> <ul> <li>Factorization is variable elimination in a graph</li> <li>The spirit is like ‘an undirected graph to a DAG’</li> </ul> <p><img src="/assets/images/matrix_graph/elimination-0.png" width="60%" /> <img src="/assets/images/matrix_graph/elimination-1.png" width="60%" /> <img src="/assets/images/matrix_graph/elimination-2.png" width="60%" /></p> <ul> <li>Factorization is partial QR in a matrix</li> <li>Use Householder reflections or Givens rotations</li> </ul> <p><img src="/assets/images/matrix_graph/factorization-0.png" width="60%" /> <img src="/assets/images/matrix_graph/factorization-1.png" width="60%" /> <img src="/assets/images/matrix_graph/factorization-2.png" width="60%" /></p> <h3 id="variable-elimination-ordering">Variable elimination ordering</h3> <p>The ordering in elimination matters. A bad order results in fill-in and leads to a dense matrix to solve.</p> <ul> <li>Topologically densely connected</li> </ul> <p><img src="/assets/images/matrix_graph/dense.png" width="40%" /></p> <ul> <li> <p>We prefer smaller <strong>degrees</strong> in order</p> <p><img src="/assets/images/matrix_graph/factor-graph.png" width="40%" /></p> </li> <li>Degrees: <ul> <li>l1=1, l2=1; x1=2; x2=3, x3=2.</li> </ul> </li> <li> <p>Computationally dense</p> <p><img src="/assets/images/matrix_graph/elimination-vars.png" width="40%" /></p> </li> <li>We prefer fewer <strong>entries</strong> in the rows during QR <ul> <li>Direct order: 2 fill-in, then 1 fill-in</li> <li>Inverse order: 2 fill-in, then 3 fill-in</li> </ul> <p><img src="/assets/images/matrix_graph/elimination-matrices.png" width="40%" /></p> </li> </ul> <h3 id="heuristics-for-reordering">Heuristics for reordering</h3> <ul> <li> <p>Reorder according to degrees: COLAMD</p> <p>[TODO] <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.1109&amp;rep=rep1&amp;type=pdf">https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.1109&amp;rep=rep1&amp;type=pdf</a></p> </li> <li>Graph partitioning: METIS</li> <li> <p>A graph can be somehow partitioned into</p> <p><img src="/assets/images/matrix_graph/dissection-vis.png" width="30%" /> <img src="/assets/images/matrix_graph/dissection-theorem.png" width="60%" /></p> </li> </ul> <h3 id="incremental-update-and-the-clique-tree">Incremental update and the clique tree</h3> <p><strong>Fact 1</strong></p> <ul> <li>Variable elimination results in not only a DAG but also a chordal graph (PGM Theorem 9.8) <ul> <li>A <strong>chordal</strong> <strong>graph</strong> is a triangulated graph</li> </ul> <p><img src="/assets/images/matrix_graph/15.png" width="40%" /></p> </li> </ul> <p><strong>Fact 2</strong></p> <ul> <li> <p>We can form a <strong>clique graph</strong> from a graph, by connecting max-cliques</p> <p><img src="/assets/images/matrix_graph/16.png" width="40%" /></p> </li> </ul> <p><strong>Fact 3</strong></p> <ul> <li>A <strong>clique graph</strong> is a <strong>clique tree</strong> in a chordal graph</li> <li> <p>Trees are good for elimination or factorization esp. when the cliques are small</p> </li> <li> <p>A clique tree depicts the global structure</p> <p><img src="/assets/images/matrix_graph/17.png" width="40%" /></p> </li> <li> <p>Each clique depicts local distributions</p> <p><img src="/assets/images/matrix_graph/18.png" width="40%" /></p> </li> <li> <p>In a clique is a dense upper triangular matrix after reducing the separator column</p> <p><img src="/assets/images/matrix_graph/19.png" width="40%" /></p> </li> </ul> <p><strong>Incremental update</strong></p> <ul> <li>Identify the path from the root that contains the variables</li> <li>Redo the variable elimination locally to update the <strong>clique</strong> structure <ul> <li>Givens rotation locally</li> </ul> <p><img src="/assets/images/matrix_graph/20.png" width="40%" /> <img src="/assets/images/matrix_graph/21.png" width="40%" /></p> <p><img src="/assets/images/matrix_graph/Screen_Shot_2021-07-26_at_12.16.55_AM.png" width="40%" /> <img src="/assets/images/matrix_graph/22.png" width="40%" /></p> </li> </ul> <h3 id="summary">Summary</h3> <p>Up to now, we know the connection between</p> <ul> <li><strong>Matrix</strong> factorization (QR decomposition, reordering rows, locally dense triangles)</li> <li>and <strong>graph</strong> topology manipulation (undirected to DAG, factor to Bayes, degree, clique)</li> </ul> <p>Heuristically, we know <strong>local</strong> <strong>topology</strong> (vertex degree) matters. There could be more fundamental insights.</p> <h2 id="depict-a-graph-in-matrix">Depict a graph in matrix</h2> <ul> <li>We used <strong>graphics</strong> to depict graphs. They are intuitive, but not easy to analyze.</li> <li>Although they are <strong>related to</strong> Jacobian matrices (in factorization), they are not equivalent.</li> <li>We now use <strong>incidence matrices</strong> and <strong>graph laplacian</strong> to directly depict matrix topology.</li> </ul> <h3 id="incidence-matrix">Incidence matrix</h3> <ul> <li> <p>A V x E matrix, each <strong>column</strong> encode the (weighted) <strong>edge</strong> connecting 2 <strong>vertices</strong></p> <p><img src="/assets/images/matrix_graph/23.png" width="30%" /></p> <p><img src="/assets/images/matrix_graph/24.png" width="30%" /> <img src="/assets/images/matrix_graph/25.png" width="28%" /></p> </li> <li> <p>Sounds familiar? A Jacobian is E x V, each <strong>row</strong> encodes a <strong>measurement</strong> connecting 2 <strong>variables</strong></p> <p><img src="/assets/images/matrix_graph/factor-graph.png" width="40%" /></p> <p><img src="/assets/images/matrix_graph/factor-graph-matrix.png" width="40%" /></p> </li> </ul> <p>There are even more similarities.</p> <h3 id="laplacian-matrix">Laplacian matrix</h3> <ul> <li>Laplacian is given by $$L = D - A$$, where D and A are degree and adjacency matrices. It is a V x V matrix.</li> <li>Graph laplacian depicts the graph’s intrinsic connectivity at the vertices’ perspective.</li> <li> <p><strong>Another derivation is</strong> $$L = MM^\top$$, where $$M$$ is the <strong>incidence matrix.</strong></p> <p><img src="/assets/images/matrix_graph/26.png" width="80%" /></p> </li> <li>Sounds familiar again? A Fisher Information Matrix is given by $$\Lambda = J^\top J$$, where $$J$$ is the Jacobian matrix.</li> <li>We will revisit some special setups where $$\Lambda$$ and $$L$$ can be directly connected with equations.</li> </ul> <h3 id="conectivity-from-laplacian">Conectivity from Laplacian</h3> <ul> <li>Key message: we can obtain <strong>global</strong> <strong>topology</strong> (graph <em>**</em>connectivity) of the graph via the Laplacian matrix.</li> <li>Measurement 1: <strong>Algebraic connectivity / Fielder value</strong> <ul> <li>$$\lambda_2(L)$$, where $$0 = \lambda_1 \le \cdots \le \lambda_n$$</li> <li>This connecivity is 0 if the graph is disconnected; 1 if the graph is complete.</li> <li>For the example above the eigen values are: <ul> <li>[3.33066907e-16, <strong>7.21586391e-01</strong>, 1.68256939e+00, 3.00000000e+00, 3.70462437e+00, 4.89121985e+00]</li> </ul> </li> <li>Associated eigenvector: <ul> <li>[<strong>0.41486979, 0.30944167, 0.0692328</strong> , -0.22093352, 0.22093352, -0.79354426]</li> </ul> </li> </ul> </li> <li>Measurement 2: <strong>Tree connectivity / Kirchoff’s matrix-tree theorem</strong> <ul> <li>Count the number of spanning trees <ul> <li> $t(G) = \frac{1}{n} \lambda_2 \lambda_3 \cdots\lambda_n$ </li> <li>For the example above: 11</li> <li>Proof: <a href="http://math.fau.edu/locke/Graphmat.htm">http://math.fau.edu/locke/Graphmat.htm</a></li> </ul> </li> <li>Related: Cayley’s formula $$t(G) = n^{n-2}$$ for a complete graph.</li> </ul> </li> </ul> <h3 id="reduced-laplacian">Reduced Laplacian</h3> <ul> <li>Remove one selected row (vertex) and the corresponding arbitrary column (edge)</li> <li>‘Corresponding to’ a Jacobian: remove a vertex or anchor a variable</li> <li><strong>Algebraic connectivity: $$\lambda_1(L^r)$$</strong> <ul> <li>SVD / Eigenvalue decomposition</li> </ul> </li> <li><strong>Tree connectivity: $$\det L^r$$</strong> <ul> <li>Relatively easy to compute, esp. for sparse matrices: <ul> <li>Use Cholesky decomposition, then compute the product of diagonal elements</li> </ul> </li> </ul> </li> </ul> <h2 id="associate-laplacian-with-information">Associate Laplacian with Information</h2> <ul> <li>The simplest $$R^d$$-sync problem <ul> <li>We have variables $$x_j \in \mathbb{R}^d$$ and measurements $$z_{ij} = x_i - x_j + \epsilon_{ij}$$.</li> <li>Covariance per edge is simplified to $$\sigma_{i}^2 \mathbf{I}_d$$</li> <li>The Jacobian is the incidence matrix (transpose), if d=1 and the variance per edge is 1 and we discard priors</li> </ul> <p><img src="/assets/images/matrix_graph/factor-graph.png" width="40%" /> <img src="/assets/images/matrix_graph/27.png" width="40%" /></p> </li> <li> <p>If we take in one prior then the Jacobian is the reduced incident matrix (transpose)</p> $z = \mathbf{M}^{r, \top} x + \epsilon$ </li> <li> <p>If we extend to arbitrary dimension d then it becomes</p> <p>$$z = (\mathbf{M}^r \otimes \mathbf{I}_d)^\top x + \epsilon$$ (Kronecker multipler expands each 1x1 entry to a dxd block)</p> </li> <li> <p>Then considering the variance as a diagonal weight matrix, the information matrix is given by</p> $\Lambda = (\mathbf{M}^r \otimes \mathbf{I}_d)^\top (\mathbf{W} \otimes \mathbf{I}_d) (\mathbf{M}^r \otimes \mathbf{I}_d) = (\mathbf{M}^r \mathbf{W} \mathbf{M}^r)^\top \otimes \mathbf{I}_d = \mathbf{L}^r_w \otimes \mathbf{I}_d$ <p>Here we are using the weighted Laplacian, where the weight per edge is given by the measurement variance.</p> </li> <li> <p><strong>This is a very close association between Information and Laplacian matrices.</strong></p> $\Lambda = \mathbf{L}^r_w \otimes \mathbf{I}_d$ </li> </ul> <h3 id="e-optimal-and-albegraic-connectivity">E-Optimal and Albegraic connectivity</h3> <ul> <li>Covariance is inverse information $$\mathbf{C}^{-1} \approx \Lambda = \mathbf{L}^r_w \otimes \mathbf{I}_d$$</li> <li>If we apply <strong>eigenvalue decomposition</strong> to a <strong>covariance matrix</strong> C, then the <strong>max</strong> eigenvalue gives the worst case <strong>error variance (E-optimal design)</strong> among all unit directions.</li> <li> $\lambda_{max}(C) \approx 1 / \lambda_1(\Lambda) = 1 / \lambda_1(L_w^r)$ </li> <li> <p>So the <strong>albebraic connectivity</strong> defines the <strong>variance error bound</strong></p> <p><img src="/assets/images/matrix_graph/28.png" width="40%" /></p> </li> </ul> <h3 id="d-criterion-and-tree-connectivity">D-criterion and Tree connectivity</h3> <ul> <li>If we apply <strong>determinant</strong> to a covariance matrix C, then the value provides a scalar measure of the <strong>uncertainty (D-criterion)</strong> in the covariance matrix.</li> <li> $\log \det C \approx \log \det \Lambda^{-1} = - \log \det (\mathbf{L}^r_w \otimes \mathbf{I}_d) = -d \log t(G)$ </li> <li>So the <strong>tree connectivity</strong> gives the <strong>uncertainty estimate</strong></li> </ul> <p>There are extensions to less trivial 2D/3D SLAM, and corresponding corollaries. In a nutshell, <strong>graph topology</strong> and <strong>estimation uncertainty</strong> is connected via <strong>Laplacian</strong> and <strong>Fisher Information</strong>.</p> <p><img src="/assets/images/matrix_graph/29.png" width="40%" /> <img src="/assets/images/matrix_graph/30.png" width="40%" /></p> <h3 id="topology-selection-for-a-better-d-criterion">Topology selection for a better D-criterion</h3> <p>With the empirical connection, we drop covariance now and consider <strong>only</strong> the topology or the Laplacian</p> <p>We are interested in edge selection (e.g., loop closure selection in a pose graph)</p> <p><img src="/assets/images/matrix_graph/31.png" width="40%" /> <img src="/assets/images/matrix_graph/32.png" width="40%" /></p> <ul> <li>k-ESP: Given a initial graph and several candidate edges (say potential loop closures), select at most k-edges to maximize connectivity and reduce uncertainty</li> <li>$$\Delta$$-ESP: Select as few edges as possible to reach the desired information gain $$\Delta$$</li> </ul> <p>We know this can be computed in a batch:</p> <ul> <li>Select several edges, run Cholesky of the updated Laplacian, and measure t(G) (tree connectivity)</li> <li>Exact solution by evaluating in a batches is NP-hard. There are C(m, k) combinations.</li> </ul> <p>Can we do it incrementally by <strong>adding one edge $m_e$ in the incidence matrix</strong>?</p> <ul> <li>Yes, the gain is given by $$\Delta_e = m_e^\top L^{-1} m_e$$, where $$m_e$$ is the additional column in the incidence matrix</li> <li> <p>So we can use the greedy algorithm: every time we add the edge with the max gain, then update the Laplacian</p> </li> <li> <p>Other solutions include convex relaxation</p> $L = L_{init} + \sum_j \pi_j L_{e_j} = MW^\pi M$ <p><img src="/assets/images/matrix_graph/33.png" width="40%" /></p> </li> <li> <p>Non-convex version</p> <p><img src="/assets/images/matrix_graph/34.png" width="30%" /></p> </li> <li> <p>Convex relaxation (can be solved with a laplacian multiplier</p> <p><img src="/assets/images/matrix_graph/35.png" width="30%" /></p> </li> <li>In short: we know a criteria to properly select edges (loops) by (weighted) topology.</li> </ul> <h3 id="problems">Problems</h3> <ul> <li>Can we generalize from the block diagonal covariance to general covariance? In other words, can we put more information to edges rather than a scalar weight?</li> <li>Can we put noisy-or factors on edges to support multi-hypothesis inference?</li> <li>Tree-connectivity is a global measurement. Can we limit it in a subgraph and consider properties in clique trees? <ul> <li>For a clique we can use Cayley’s formula to directly obtain tree connectivity without decomposition. can this help in connectivity measurement?</li> </ul> </li> </ul>Wei DongMotivation Understand better the connections between matrices and graphs in previous works; Recap details. Factor Graphs for Robot Perception Have a glance at advances in the intersection of graph theory and SLAM Reliable Graphs for SLAM, IJRR 2019 Cramér–Rao Bounds and Optimal Design Metrics for Pose-Graph SLAM, TRO 2021ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception2021-10-01T00:00:00+00:002021-10-01T00:00:00+00:00https://dongwei.info/publication/ash<h2 id="abstract">Abstract</h2> <p>We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU.</p> <p>Compared to existing GPU hash map implementations, ASH achieves higher performance, supports richer functionality, and requires fewer lines of code (LoC) when used for implementing spatially varying operations from volumetric geometry reconstruction to differentiable appearance reconstruction.</p> <p>Unlike existing GPU hash maps, the ASH framework provides a versatile tensor interface, hiding low-level details from the users. In addition, by decoupling the internal hashing data structures and key-value data in buffers, we offer direct access to spatially varying data via indices, enabling seamless integration to modern libraries such as PyTorch.</p> <p>To achieve this, we</p> <ol> <li>detach stored key-value data from the low-level hash map implementation;</li> <li>bridge the pointer-first low level data structures to index-first high-level tensor interfaces via an index heap;</li> <li>adapt both generic and non-generic integer-only hash map implementations as backends to operate on multi-dimensional keys.</li> </ol> <p>We first profile our hash map against state-of-the-art hash maps on synthetic data to show the performance gain from this architecture. We then show that ASH can consistently achieve higher performance on various large-scale 3D perception tasks with fewer LoC by showcasing several applications, including</p> <ol> <li>point cloud voxelization,</li> <li>dense volumetric SLAM,</li> <li>non-rigid point cloud registration and volumetric deformation, and</li> <li>spatially varying geometry and appearance refinement.</li> </ol> <p>ASH and its example applications are open sourced in <a href="https://open3d.org">Open3D</a>.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{dong2021ash, title={ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception}, author={Dong, Wei and Lao, Yixing and Kaess, Michael and Koltun, Vladlen}, journal={arXiv preprint arXiv:2110.00511}, year={2021} } </code></pre></div></div><b>Wei Dong</b>, Yixing Lao, Michael Kaess, Vladlen KoltunAbstract We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU.Facilitate Mitsuba2 with Open3D2021-09-01T00:00:00+00:002021-09-01T00:00:00+00:00https://dongwei.info/blog/mitsuba<p><a href="https://github.com/mitsuba-renderer/mitsuba2">Mitsuba2</a> is a versatile yet neat physically-based rendering system. By providing camera matrices, users can easily render high quality images. The question is how to obtain proper viewpoints for a 3D model.</p> <p>A workflow is to jointly use it with <a href="https://github.com/intel-isl/Open3D">Open3D</a> that generates non-physically based rendering.</p> <h2 id="capture-view-point">Capture view point</h2> <p>Using the snippet</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">open3d</span> <span class="k">as</span> <span class="n">o3d</span> <span class="n">mesh</span> <span class="o">=</span> <span class="n">o3d</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">read_triangle_mesh</span><span class="p">(</span><span class="s">'/path/to/mesh.ply'</span><span class="p">)</span> <span class="n">mesh</span><span class="p">.</span><span class="n">compute_triangle_normals</span><span class="p">()</span> <span class="n">o3d</span><span class="p">.</span><span class="n">visualization</span><span class="p">.</span><span class="n">draw_geometries</span><span class="p">([</span><span class="n">mesh</span><span class="p">])</span> </code></pre></div></div> <p>an interactive window will pop up. Move your mouse until you find a good view point, then press ‘P’. Open3D will take a decent screenshot <img src="/assets/images/apt-o3d.png" alt="apt-o3d" /> along with a json file in the shape of</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"class_name"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"PinholeCameraParameters"</span><span class="p">,</span><span class="w"> </span><span class="nl">"extrinsic"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="mf">0.8390546152869085</span><span class="p">,</span><span class="w"> </span><span class="mf">0.54133390210883348</span><span class="p">,</span><span class="w"> </span><span class="mf">0.054267476386522975</span><span class="p">,</span><span class="w"> </span><span class="mf">0.0</span><span class="p">,</span><span class="w"> </span><span class="mf">-0.3175183711744502</span><span class="p">,</span><span class="w"> </span><span class="mf">0.40625158739394251</span><span class="p">,</span><span class="w"> </span><span class="mf">0.85682071152991279</span><span class="p">,</span><span class="w"> </span><span class="mf">0.0</span><span class="p">,</span><span class="w"> </span><span class="mf">0.44177985075426657</span><span class="p">,</span><span class="w"> </span><span class="mf">-0.73615029319258285</span><span class="p">,</span><span class="w"> </span><span class="mf">0.51275072822962653</span><span class="p">,</span><span class="w"> </span><span class="mf">0.0</span><span class="p">,</span><span class="w"> </span><span class="mf">1.075349911825306</span><span class="p">,</span><span class="w"> </span><span class="mf">0.86201671957262604</span><span class="p">,</span><span class="w"> </span><span class="mf">9.441486541912365</span><span class="p">,</span><span class="w"> </span><span class="mf">1.0</span><span class="w"> </span><span class="p">],</span><span class="w"> </span><span class="nl">"intrinsic"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"height"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">1012</span><span class="p">,</span><span class="w"> </span><span class="nl">"intrinsic_matrix"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="mf">876.41770862985197</span><span class="p">,</span><span class="w"> </span><span class="mf">0.0</span><span class="p">,</span><span class="w"> </span><span class="mf">0.0</span><span class="p">,</span><span class="w"> </span><span class="mf">0.0</span><span class="p">,</span><span class="w"> </span><span class="mf">876.41770862985197</span><span class="p">,</span><span class="w"> </span><span class="mf">0.0</span><span class="p">,</span><span class="w"> </span><span class="mf">959.5</span><span class="p">,</span><span class="w"> </span><span class="mf">505.5</span><span class="p">,</span><span class="w"> </span><span class="mf">1.0</span><span class="w"> </span><span class="p">],</span><span class="w"> </span><span class="nl">"width"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">1920</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nl">"version_major"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"version_minor"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <h2 id="render">Render</h2> <p>Now copy the extrinsic matrix (note in json it is stored in <strong>colume major</strong>) to a simplest configure file of <code class="language-plaintext highlighter-rouge">Mitsuba</code>, and you are almost ready to render.</p> <div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;scene</span> <span class="na">version=</span><span class="s">"2.0.0"</span><span class="nt">&gt;</span> <span class="nt">&lt;shape</span> <span class="na">type=</span><span class="s">"ply"</span><span class="nt">&gt;</span> <span class="nt">&lt;string</span> <span class="na">name=</span><span class="s">"filename"</span> <span class="na">value=</span><span class="s">"/path/to/mesh.ply"</span><span class="nt">/&gt;</span> <span class="nt">&lt;/shape&gt;</span> <span class="nt">&lt;integrator</span> <span class="na">type=</span><span class="s">"path"</span><span class="nt">&gt;</span> <span class="nt">&lt;integer</span> <span class="na">name=</span><span class="s">"max_depth"</span> <span class="na">value=</span><span class="s">"8"</span><span class="nt">/&gt;</span> <span class="nt">&lt;/integrator&gt;</span> <span class="nt">&lt;default</span> <span class="na">name=</span><span class="s">"spp"</span> <span class="na">value=</span><span class="s">"256"</span><span class="nt">/&gt;</span> <span class="nt">&lt;emitter</span> <span class="na">id=</span><span class="s">"light_0"</span> <span class="na">type=</span><span class="s">"constant"</span><span class="nt">&gt;</span> <span class="nt">&lt;spectrum</span> <span class="na">name=</span><span class="s">"radiance"</span> <span class="na">value=</span><span class="s">"2.0"</span><span class="nt">/&gt;</span> <span class="nt">&lt;/emitter&gt;</span> <span class="nt">&lt;sensor</span> <span class="na">type=</span><span class="s">"perspective"</span><span class="nt">&gt;</span> <span class="nt">&lt;transform</span> <span class="na">name=</span><span class="s">"to_world"</span><span class="nt">&gt;</span> <span class="nt">&lt;scale</span> <span class="na">x=</span><span class="s">"-1"</span><span class="nt">/&gt;</span> <span class="nt">&lt;scale</span> <span class="na">y=</span><span class="s">"-1"</span><span class="nt">/&gt;</span> <span class="nt">&lt;matrix</span> <span class="na">value=</span><span class="s">" 0.83905462 0.5413339 0.05426748 -1.88128183 -0.31751837 0.40625159 0.85682071 -8.09841352 0.44177985 -0.73615029 0.51275073 -4.68162316 0. 0. 0. 1. "</span><span class="nt">/&gt;</span> <span class="nt">&lt;/transform&gt;</span> <span class="nt">&lt;float</span> <span class="na">name=</span><span class="s">"fov"</span> <span class="na">value=</span><span class="s">"60"</span><span class="nt">/&gt;</span> <span class="nt">&lt;sampler</span> <span class="na">type=</span><span class="s">"independent"</span><span class="nt">&gt;</span> <span class="nt">&lt;integer</span> <span class="na">name=</span><span class="s">"sample_count"</span> <span class="na">value=</span><span class="s">"\$spp"</span><span class="nt">/&gt;</span> <span class="nt">&lt;/sampler&gt;</span> <span class="nt">&lt;film</span> <span class="na">type=</span><span class="s">"hdrfilm"</span><span class="nt">&gt;</span> <span class="nt">&lt;integer</span> <span class="na">name=</span><span class="s">"width"</span> <span class="na">value=</span><span class="s">"512"</span><span class="nt">/&gt;</span> <span class="nt">&lt;integer</span> <span class="na">name=</span><span class="s">"height"</span> <span class="na">value=</span><span class="s">"512"</span><span class="nt">/&gt;</span> <span class="nt">&lt;rfilter</span> <span class="na">type=</span><span class="s">"box"</span><span class="nt">/&gt;</span> <span class="nt">&lt;/film&gt;</span> <span class="nt">&lt;/sensor&gt;</span> <span class="nt">&lt;/scene&gt;</span> </code></pre></div></div> <p>In practice, first use a small number (like 16) for <code class="language-plaintext highlighter-rouge">spp</code> (samples per point), and a small resolution for the image (like 128 x 128), and see if the output looks reasonable. Try to adjust the perspective in the config or the extrinsics back in <code class="language-plaintext highlighter-rouge">Open3D</code> interactively, until a decent thumbnail is available. Then increase <code class="language-plaintext highlighter-rouge">spp</code> and resolution to render the final image:</p> <p><img src="/assets/images/ash-teaser.png" alt="ash" /></p> <h2 id="extension">Extension</h2> <p>As of now (Octobor 2021) only perspective cameras are supported. If you want to play with fancy orthognal rendering, you may want to build <a href="https://github.com/mitsuba-renderer/mitsuba2/pull/279">contribution plugins</a>. This will be helpful in larger room rendering, e.g.</p> <p><img src="/assets/images/lab.png" alt="lab" /></p> <p>but the tuning of parameters is less intuitive and more depending on manual adjustment.</p>Wei DongMitsuba2 is a versatile yet neat physically-based rendering system. By providing camera matrices, users can easily render high quality images. The question is how to obtain proper viewpoints for a 3D model.Self-Supervised Geometric Perception2021-06-01T00:00:00+00:002021-06-01T00:00:00+00:00https://dongwei.info/publication/sgp<h2 id="abstract">Abstract</h2> <p>We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations).</p> <p>Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block.</p> <p>This analysis naturally leads to our second contribution – the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms:</p> <ul> <li>a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels,</li> <li>a student that performs deep feature learning under noisy supervision of the pseudo-labels.</li> </ul> <p>As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{yang2021sgp, author = {Yang, Heng and Dong, Wei and Carlone, Luca and Koltun, Vladlen}, title = {Self-Supervised Geometric Perception}, booktitle = {CVPR}, month = {June}, year = {2021}, pages = {14350-14361} } </code></pre></div></div>Heng Yang*, <b>Wei Dong*</b>, Luca Carlone, Vladlen KoltunAbstract We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations).Deep Global Registration2020-06-01T00:00:00+00:002020-06-01T00:00:00+00:00https://dongwei.info/publication/dgr<h2 id="abstract">Abstract</h2> <p>We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans.</p> <p>Deep global registration is based on three modules:</p> <ul> <li>a 6-dimensional convolutional network for correspondence confidence prediction,</li> <li>a differentiable Weighted Procrustes algorithm for closed-form pose estimation,</li> <li>and a robust gradient-based SE(3) optimizer for pose refinement.</li> </ul> <p>Experiments demonstrate that our approach outperforms state-of-the-art methods, both learning-based and classical, on real-world data.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{choy2020deep, title={Deep Global Registration}, author={Choy, Christopher and Dong, Wei and Koltun, Vladlen}, booktitle={CVPR}, year={2020} } </code></pre></div></div>Chris Choy*, <b>Wei Dong*</b>, Vladlen KoltunAbstract We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans.GPU Accelerated Robust Scene Reconstruction2019-09-01T00:00:00+00:002019-09-01T00:00:00+00:00https://dongwei.info/publication/gpu<h2 id="abstract">Abstract</h2> <p>We propose a fast and accurate 3D reconstruction system that takes a sequence of RGB-D frames and produces a globally consistent camera trajectory and a dense 3D geometry.</p> <p>We redesign core modules of a state-of-the-art offline reconstruction pipeline to maximally exploit the power of GPU. We introduce GPU accelerated core modules that include</p> <ul> <li>RGBD odometry,</li> <li>geometric feature extraction and matching,</li> <li>point cloud registration,</li> <li>volumetric integration and mesh extraction.</li> </ul> <p>Therefore, while being able to reproduce the results of the high-fidelity offline reconstruction system, our system runs more than 10 times faster on average. Nearly 10Hz can be achieved in medium size indoor scenes, making our offline system even comparable to online Simultaneous Localization and Mapping (SLAM) systems in terms of the speed.</p> <p>Experimental results show that our system produces more accurate results than several state-of-the-art online systems. The system is open source at https://github.com/theNded/Open3D.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{dong2019gpu, title={GPU accelerated robust scene reconstruction}, author={Dong, Wei and Park, Jaesik and Yang, Yi and Kaess, Michael}, booktitle={IROS}, pages={7863--7870}, year={2019}, organization={IEEE} } </code></pre></div></div><b>Wei Dong</b>, Jaesik Park, Yi Yang, Michael KaessAbstract We propose a fast and accurate 3D reconstruction system that takes a sequence of RGB-D frames and produces a globally consistent camera trajectory and a dense 3D geometry.Joint Multi-view Texture Super-resolution and Intrinsic Decomposition2019-03-01T00:00:00+00:002019-03-01T00:00:00+00:00https://dongwei.info/publication/sr<h2 id="abstract">Abstract</h2> <p>We aim to recover a high resolution texture representation of objects observed from multiple view points under varying lighting conditions.</p> <p>For many applications the lighting conditions need to be changed and thus require a texture decomposition into shading and albedo components. Both texture super-resolution and intrinsic texture decomposition have been separately studied in the literature. Yet, no method has investigated how these methods can be combined. We propose a framework for joint texture map superresolution and intrinsic decomposition. To this end, we define shading and albedo maps of the 3D object as the intrinsic properties of its texture and introduce an image formation model to describe the physics of the image generation.</p> <p>Our approach accounts for surface geometry and camera calibation errors and is also applicable to spatio-temporal sequences. Our method achieves state-of-the-art results on a variety of datasets.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{tsiminaki2019joint, title={Joint Multi-view Texture Super-resolution and Intrinsic Decomposition.}, author={Tsiminaki, Vagia and Dong, Wei and Oswald, Martin R and Pollefeys, Marc}, booktitle={BMVC}, pages={15}, year={2019} } </code></pre></div></div>Vagia Tsiminaki, <b>Wei Dong</b>, Martin R. Oswald, and Marc PollefeysAbstract We aim to recover a high resolution texture representation of objects observed from multiple view points under varying lighting conditions.PSDF Fusion: Probabilistic Signed Distance Function for On-the-fly 3D Data Fusion and Scene Reconstruction2018-09-01T00:00:00+00:002018-09-01T00:00:00+00:00https://dongwei.info/publication/psdf<h2 id="abstract">Abstract</h2> <p>We propose a novel 3D spatial representation for data fusion and scene reconstruction. Probabilistic Signed Distance Function (Probabilistic SDF, PSDF) is proposed to depict uncertainties in the 3D space. It is modeled by a joint distribution describing SDF value and its inlier probability, reflecting input data quality and surface geometry.</p> <p>A hybrid data structure involving voxel, surfel, and mesh is designed to fully exploit the advantages of various prevalent 3D representations. Connected by PSDF, these components reasonably cooperate in a consistent framework. Given sequential depth measurements, PSDF can be incrementally refined with less ad hoc parametric Bayesian updating. Supported by PSDF and the efficient 3D data representation, high-quality surfaces can be extracted on-the-fly, and in return contribute to reliable data fusion using the geometry information.</p> <p>Experiments demonstrate that our system reconstructs scenes with higher model quality and lower redundancy, and runs faster than existing online mesh generation systems.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{dong2018psdf, title={PSDF fusion: Probabilistic signed distance function for on-the-fly 3D data fusion and scene reconstruction}, author={Dong, Wei and Wang, Qiuyuan and Wang, Xin and Zha, Hongbin}, booktitle={ECCV}, pages={701--717}, year={2018} } </code></pre></div></div><b>Wei Dong</b>, Qiuyuan Wang, Xin Wang, Hongbin ZhaAbstract We propose a novel 3D spatial representation for data fusion and scene reconstruction. Probabilistic Signed Distance Function (Probabilistic SDF, PSDF) is proposed to depict uncertainties in the 3D space. It is modeled by a joint distribution describing SDF value and its inlier probability, reflecting input data quality and surface geometry.An Efficient Volumetric Mesh Representation for Real-time Scene Reconstruction using Spatial Hashing2018-03-01T00:00:00+00:002018-03-01T00:00:00+00:00https://dongwei.info/publication/mesh<h2 id="abstract">Abstract</h2> <p>Mesh plays an indispensable role in dense real-time reconstruction essential in robotics. Efforts have been made to maintain flexible data structures for 3D data fusion, yet an efficient incremental framework specifically designed for online mesh storage and manipulation is missing.</p> <p>We propose a novel framework to compactly generate, update, and refine mesh for scene reconstruction upon a volumetric representation. Maintaining a spatial-hashed field of cubes, we distribute vertices with continuous value on discrete edges that support O(1) vertex accessing and forbid memory redundancy. By introducing Hamming distance in mesh refinement, we further improve the mesh quality regarding the triangle type consistency with a low cost. Lock-based and lock-free operations were applied to avoid thread conflicts in GPU parallel computation.</p> <p>Experiments demonstrate that the mesh memory consumption is significantly reduced while the running speed is kept in the online reconstruction process.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{dong2018efficient, title={An efficient volumetric mesh representation for real-time scene reconstruction using spatial hashing}, author={Dong, Wei and Shi, Jieqi and Tang, Weijie and Wang, Xin and Zha, Hongbin}, booktitle={ICRA}, pages={6323--6330}, year={2018}, organization={IEEE} } </code></pre></div></div><b>Wei Dong</b>, Jieqi Shi, Weijie Tang, Xin Wang, Hongbin ZhaAbstract Mesh plays an indispensable role in dense real-time reconstruction essential in robotics. Efforts have been made to maintain flexible data structures for 3D data fusion, yet an efficient incremental framework specifically designed for online mesh storage and manipulation is missing.Edge Enhanced Direct Visual Odometry2016-03-01T00:00:00+00:002016-03-01T00:00:00+00:00https://dongwei.info/publication/eedvo<h2 id="abstract">Abstract</h2> <p>We propose an RGB-D visual odometry method that both minimizes the photometric error and aligns the edges between frames. The combination of the direct photometric information and the edge features leads to higher tracking accuracy and allows the approach to deal with challenging texture-less scenes.</p> <p>In contrast to traditional line feature based methods, we use all edges rather than only line segments, avoiding aperture problem and the uncertainty of endpoints. Instead of explicitly matching edge features, we design a dense representation of edges to align them, bridging the direct methods and the feature-based methods in tracking. Image alignment and feature matching are performed in a general framework, where not only pixels but also salient visual landmarks are aligned.</p> <p>Evaluations on real-world benchmark datasets show that our method achieves competitive results in indoor scenes, especially in texture-less scenes where it outperforms the state-of-the-art algorithms.</p> <h2 id="citation">Citation</h2> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{wang2016edge, title={Edge Enhanced Direct Visual Odometry.}, author={Wang, Xin and Dong, Wei and Zhou, Mingcai and Li, Renju and Zha, Hongbin}, booktitle={BMVC}, year={2016} } </code></pre></div></div>Xin Wang, <b>Wei Dong</b>, Mingcai Zhou, Renju Li, Hongbin ZhaAbstract We propose an RGB-D visual odometry method that both minimizes the photometric error and aligns the edges between frames. The combination of the direct photometric information and the edge features leads to higher tracking accuracy and allows the approach to deal with challenging texture-less scenes.