Inferring Network Structures from Diffusion Traces
We are surrounded by all kinds of networks: a population can be seen as a network where people are connected through social relationships; the World Wide Web is a network where web pages are linked through hyperlinks. Many real-world processes can be modeled as contagions in networks. For instance, a virus can follow the pathways of a social network to propagate through a large population as individuals interact with and transmit the virus to people they know. Understanding the network structures can often provide useful information to predict and to control the spread. In the case of an infectious disease, we can form better strategies to contain the infections if we know how people are related in a population. However, in many cases of network diffusion processes, it is almost impossible to observe the exact who-infects-whom relationship. A person that is infected by virus is not likely to know the precise identity of the person that passed him or her the infection. This thesis project aims to develop algorithms to recover the links of an unknown network under two types of observation models. First, we consider the case when the timestamps and identities of all infected individuals are given. Second, we consider the case when we can only observe the initially infected individuals and the identities of individuals that eventually receive the infection, but not any other temporal information. In each case, we can observe multiple different diffusion processes unfolding over the same unknown network. For each case, we first formulate the probability model of transmissions. Then, we develop efficient algorithms to compute the probability of a network structure given traces, which is a key step for recovering the underlying network structure. We then propose learning algorithms to find the most likely network given the observed data. Our experiments on synthetically generated graphs and diffusion traces show that our learning algorithms can produce more precise inference results than some existing approaches.