Abstract We introduce the dependence distance, a new notion of the intrinsic distance between points, derived as a pointwise extension of statistical dependence measures between variables. We then introduce a dimension reduction procedure for preserving this distance, which we call the dependence map. We explore its theoretical justification, connection to […]
Abstract A considerable amount of work has been done in data clustering research during the last four decades, and a myriad of methods has been proposed focusing on different data types, proximity functions, cluster representation models, and cluster presentation. However, clustering remains a challenging problem due to its ill-posed nature: it is well know […]
Abstract Data mining and statistical learning techniques are powerful analysis tools yet to be incorporated in the domain of urban studies and transportation research. In this work, we analyze an activity-based travel survey conducted in the Chicago metropolitan area over a demographic representative sample of its population. Detailed data on activities by […]
Abstract Non-negative matrix factorization (NMF) is a method to obtain a representation of data using non-negativity constraints. A popular approach is alternating non-negative least squares (ANLS). As is well known, if the sequence generated by ANLS has at least one limit point, then the limit point is a stationary point of NMF. However, no evdience has sh […]
Abstract Influence maximization, defined by Kempe et al. (SIGKDD 2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in large-scale online social networks. […]