## NETWORK SCIENCE

During my literature review, I stumbled upon an information-theoretic framework to analyse the link prediction problem (Tan et al. (2014), Kumar and Sharma (2020)). For an overview of what link prediction is, read my previous article here. The basic idea is to predict unseen edges in a graph. Such edges could represent citing relationship, friendship, or dependencies in different networks. The framework models such task as an information theory problem, where an edge is more likely to form over a pair of nodes if it has a high probability (low uncertainty/low information) based on the chosen model.

In this article, we show how might one use the framework in practice. We first introduce the model and provide a code implementation to run this for a small graph. This article is heavily derived from Tan et. al. (2014), so check out their paper for more details.

If you’re interested in a more complete mathematical derivation, read my tutorial notes here. Moreover, I’ve also implemented the codes in a jupyter notebook here.

Given a graph G, we’d like to identify the likelihood of observing an edge between a pair of nodes (x, y). The framework defines such likelihood (s_xy) as the negative of conditional information of seeing edge (x, y) given their common neighbours O_xy. In information theory, the higher the probability, the lower the information (uncertainty is reduced) and hence by adding the negative term, the likelihood (s_xy) is positively proportional to probability.