3:00pm - 5:00pm
Combinatorics and algorithms in decision and reason
Combinatorial, or discrete, structures are a fundamental tool for modeling decision-making processes in a wide variety of fields including machine learning, biology, economics, sociology, and causality. Within these various contexts, the goal of key problems can often be phrased in terms of learning or manipulating a combinatorial object, such as a network, permutation, or directed acyclic graph, that exhibits pre-specified optimal features. In recent decades, major break-throughs in each of these fields can be attributed to the development of effective algorithms for learning and analyzing combinatorial models. Many of these advancements are tied to new developments connecting combinatorics, algebra, geometry, and statistics, particularly through the introduction of geometric and algebraic techniques to the development of combinatorial algorithms. The goal of this session is to bring together researchers from each of these fields who are using combinatorial or discrete models in data science so as to encourage further breakthroughs in this important area of mathematical research.
(25 minutes for each presentation, including questions, followed by a 5-minute break; in case of x<4 talks, the first x slots are used unless indicated otherwise)
(Machine) Learning Non-Linear Algebra
Machine Learning has been used successfully to improve algorithms in Optimization and Computational Logic. By training a neural network one can predict the shape of the answer or select the best parameters to run an algorithm. In this presentation I discuss some experience with applying machine learning tools to improve algorithms manipulating multivariable polynomial systems.
Network Flows in Semi-Supervised Learning via Total Variation Minimization
We study the connection between methods for semi-supervised learning for partially labeled network data and network flow optimization. Many methods for semi-supervised learning are based on minimizing the total variation of graph signals induced by labels of data points. These methods can be interpreted as flow optimization on the network-structure underlying the data. Using basic results from convex duality we show that the dual problem of network Lasso is equivalent to maximizing the network flow over the data network. Moreover, the accuracy of network Lasso depends crucially on the existence of sufficiently large network flows between labeled data points. We also provide an interpretation of a primal-dual implementation of network Lasso as distributed flow maximization which bears some similarity with the push–relabel maximum flow algorithm.
Scalably vertex-programmable ideological forests from certain political twitterverses around US (2016), UK(2017) and Swedish (2018) national elections
Using customised experimental designs via Twitter Streaming and REST APIs, we collected status updates in Twitter around three national elections. Dynamic retweet networks were transformed into empirical geometrically weighted directed graphs, where every node is a user account and every edge accounts for the number of retweets of one use by another, with a natural probabilistic interpretation. Distributed vertex programs were then used to find the most retweeted paths from every user in the population to a given set of subsets of users (subpopulations of interest). Using a pairwise-distance induced by such paths we build a population retweet-based ideological forest. This statistic can be presented while preserving the privacy of the users and attempting to increase self-awareness about how one's own ideological profiles and social norms are formed and influenced in social media. Concrete hypothesis tests around the 2016 US presidential election, SPLC-defined US hate groups and interference by Russian political bots will be the driving empirical skeleton of the talk.
The Kingman Coalescent as a density on a space of trees
Randomly pick n individuals from a population and trace their genealogy backwards in time until you reach the most recent common ancestor. The Kingman n-Coalescent is a probabilistic model for the tree one obtains this way. The common definitions are stated from a stochastic point of view. The Kingman Coalescent can also be described by a probability density function on a space of certain trees. We describe this approach and extend it to the Multispecies Coalescent model by relating population genetics to polyhedral and algebraic geometry. This is joint work in progress with Christian Haase.