A Modern View of Centrality Measures
Prof. Dr. Paolo Boldi, Università degli Studi di Milano
Given a social network, which of its nodes are more central? This question was asked many times in sociology, psychology and computer science, and a whole plethora of _centrality measures_ were proposed to account for the importance of the nodes of a network.
Also, many modern IR problems call for a mixture of classical text-retrieval methods with graph mining techniques, especially within the area of social-network search, where node centrality can play a crucial role.
But what do existing centrality measures actually measure? To what extent do they agree? What is their meaning when they disagree? And further: which of them can be computed or approximated reasonably on the large (huge) graphs under observation?
My talk is twofold. On one hand, I will try to give a comprehensive and mathematically coeherent account of the most important centrality measures from the literature, and present for the first time a set of scientifically
well-grounded methodologies to establish whether a measure is actually doing what it has been designed for. I propose to assume an axiomatic approach, wherein I suggest some simple, basic properties a centrality measure should have, and I corroborate my findings with some anectodal results on publicly available medium-to-large web and social networks. I validate the results by examining each measure under the lens of information retrieval, leveraging state-of-the-art knowledge in the discipline to measure the effectiveness of the various indices in locating web pages that are relevant to a query.
On the other hand, I approach the problem of computing a large family of centralities (known as "geometric centralities") on very large graphs accessed in a semi-streaming fashion, with a very small amount of memory (in the order of a dozen bytes) per node available in core memory. I leverage the newly discovered algorithms based on HyperLogLog counters, making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of some measures (e.g., closeness) is not new, our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using "just" 2 TiB of RAM. Moreover, the computations I describe are inherently parallelizable, and scale linearly with the number of available cores.
8 Februar 2019, 14:00
Multimedia Room, 15th Floor, L3S