Vincent Divol

Many thanks to Simon Rosenberg for the picture:-)

 

Briefly

I work as an Assistant Professor/Courant Instructor in the Mathematics Department of the Courant Institute and the Center for Data Science at New York University.

Before that, I did my Ph.D. under the supervision of Pascal Massart and Frédéric Chazal, at Université Paris-Saclay .

My research interests lie at the interface of geometry, statistics, and optimal transport. I have in particular contributed to the recent field of Topological Data Analysis, which focuses on developing techniques to extract topological and geometric information from complex datasets.

My CV can be found here.

Contact

  • E-mail: vincent [dot] divol [at] nyu [dot] edu

  • Office 607 (6th floor), NYU Center for Data Science, 60 5th Ave, New York, NY 10011

Material

Publications and Preprints

  • V. Divol, T. Lacombe Estimation and quantization of expected persistence diagrams.
    [hal, arXiv, tutorial], 2021, International Conference on Machine Learning. [Show Abstract]

    Abstract: Persistence diagrams (PDs) are the most common descriptors used to encode the topology of structured data appearing in challenging learning tasks; think e.g. of graphs, time series or point clouds sampled close to a manifold. Given random objects and the corresponding distribution of PDs, one may want to build a statistical summary-such as a mean-of these random PDs, which is however not a trivial task as the natural geometry of the space of PDs is not linear. In this article, we study two such summaries, the Expected Persistence Diagram (EPD), and its quantization. The EPD is a measure supported on R^2, which may be approximated by its empirical counterpart. We prove that this estimator is optimal from a minimax standpoint on a large class of models with a parametric rate of convergence. The empirical EPD is simple and efficient to compute, but possibly has a very large support, hindering its use in practice. To overcome this issue, we propose an algorithm to compute a quantization of the empirical EPD, a measure with small support which is shown to approximate with near-optimal rates a quantization of the theoretical EPD.

  • V. Divol A short proof on the rate of convergence of the empirical measure for the Wasserstein distance.
    [hal, arXiv], 2021, preprint. [Show Abstract]

    Abstract: We provide a short proof that the Wasserstein distance between the empirical measure of a n-sample and the estimated measure is of order n^(-1/d), if the measure has a lower and upper bounded density on the d-dimensional flat torus.

  • V. Divol Density estimation on manifolds: an optimal transport approach.
    [hal, arXiv, Presentation at the Séminaire Parisien de Statistiques (video)], 2021, preprint. [Show Abstract]

    Abstract: Assume that we observe i.i.d. points lying close to some unknown d-dimensional Ck submanifold M in a possibly high-dimensional space. We study the problem of reconstruct- ing the probability distribution generating the sample. After remarking that this problem is degenerate for a large class of standard losses (Lp, Hellinger, total variation, etc.), we focus on the Wasserstein loss, for which we build an estimator, based on kernel density estimation, whose rate of convergence depends on d and the regularity s <= k-1 of the underlying density, but not on the ambient dimension. In particular, we show that the estimator is minimax and matches previous rates in the literature in the case where the manifold M is a d-dimensional cube. The related problem of the estimation of the volume measure of M for the Wasserstein loss is also considered, for which a minimax estimator is exhibited.

  • V. Divol Minimax adaptive estimation in manifold inference.
    [hal, arXiv, tutorial], 2020, preprint. [Show Abstract]

    Abstract: We focus on the problem of manifold estimation: given a set of observations sampled close to some unknown submanifold M, one wants to recover information about the geometry of M. Minimax estimators which have been proposed so far all depend crucially on the a priori knowledge of some parameters quantifying the regularity of M (such as its reach), whereas those quantities will be unknown in practice. Our contribution to the matter is twofold: first, we introduce a one-parameter family of manifold estimators hat M_t, and show that for some choice of t (depending on the regularity parameters), the corresponding estimator is minimax on the class of models of C2 manifolds introduced in [Genovese et al., Manifold estimation and singular deconvolution under Hausdorff loss]. Second, we propose a completely data-driven selection procedure for the parameter t, leading to a minimax adaptive manifold estimator on this class of models. The same selection procedure is then used to design adaptive estimators for tangent spaces and homology groups of the manifold M.

  • V. Divol, T. Lacombe Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport.
    [journal version , hal, arXiv], 2020, Journal of Applied and Computational Topology. [Show Abstract]

    Abstract: Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a measure space, and by observing that its metrics can be expressed as solutions of optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g. persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. We study the topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to transport metrics, the existence of Fréchet means for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the usefulness of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.

  • V. Divol, W. Polonik On the choice of weight functions for linear representations of persistence diagrams.
    [journal version , hal, arXiv], 2019, Journal of Applied and Computational Topology. [Show Abstract]

    Abstract: Persistence diagrams are efficient descriptors of the topology of a point cloud. As they do not naturally belong to a Hilbert space, standard statistical methods cannot be directly applied to them. Instead, feature maps (or representations) are commonly used for the analysis. A large class of feature maps, which we call linear, depends on some weight functions, the choice of which is a critical issue. An important criterion to choose a weight function is to ensure stability of the feature maps with respect to Wasserstein distances on diagrams. We improve known results on the stability of such maps, and extend it to general weight functions. We also address the choice of the weight function by considering an asymptotic setting; assume that X_n is an i.i.d. sample from a density on [0,1]^d. For the Čech and Rips filtrations, we characterize the weight functions for which the corresponding feature maps converge as n approaches infinity, and by doing so, we prove laws of large numbers for the total persistences of such diagrams. Those two approaches (stability and convergence) lead to the same simple heuristic for tuning weight functions: if the data lies near a d-dimensional manifold, then a sensible choice of weight function is the persistence to the power a with a >= d.

  • F. Chazal, V. Divol The density of expected persistence diagrams and its kernel based estimation.
    [proceedings, hal, arXiv], 2018, Proceedings of the Symposium of Computational Geometry.
    [journal version], published in the Special Issue of Selected Papers from SoCG 2018, Journal of Computational Geometry [Show Abstract]

    Abstract: Persistence diagrams play a fundamental role in Topological Data Analysis where they are used as topological descriptors of filtrations built on top of data. They consist in discrete multisets of points in the plane R^2 that can equivalently be seen as discrete measures in R^2. When the data come as a random point cloud, these discrete measures become random measures whose expectation is studied in this paper. First, we show that for a wide class of filtrations, including the Čech and Rips-Vietoris filtrations, the expected persistence diagram, that is a deterministic measure on R^2, has a density with respect to the Lebesgue measure. Second, building on the previous result we show that the persistence surface recently introduced in [Adams & al., Persistence images: a stable vector representation of persistent homology] can be seen as a kernel estimator of this density. We propose a cross-validation scheme for selecting an optimal bandwidth, which is proven to be a consistent procedure to estimate the density.