Though science’s knowledge base is expanding rapidly, the breakthrough paper rate is narrowing and scientists take longer to make their first discoveries. Breakthroughs are related to how information is recombined, yet it remains unclear how scientists and inventors forage the knowledge base in search of tomorrow’s highest impact ideas. Studying 28 million scientific papers and 5 million U.S. patents, we uncover 2 major findings. First, we identify “Darwin’s Conjecture,” which reveals how conventional and novel ideas are balanced within breakthrough papers. Second, we find an “information hotspot.” The hotspot is that cluster of papers of a certain age distribution in the knowledge base that best predict tomorrow’s hits. Together, works that combine knowledge according to Darwin’s Conjecture or forage in the hotspot double their odds of being in the top 5% or better of citations. These patterns result in over 250 scientific and technology fields, are increasingly dominant, and outperform other predictors of impact, suggesting a universal link between the age of information and scientific discovery.
“Collective Attention on the Web”
It’s one of the most popular online videos ever produced, having been viewed more than 800 million times on YouTube. At first glance, it’s hard to understand why the clip is so famous, since nothing much happens. Two little boys, Charlie and Harry, are sitting in a chair when Charlie, the younger brother, mischievously bites Harry’s finger. There’s a shriek and then a laugh. The clip is called “Charlie Bit My Finger—Again!” Why has this footage gone viral? How viral is it actually? Generally, understanding the dynamics of collective attention is central to an information where millions of people leave digital footprints everyday. We therefor have developed novel computational methods to characterize, analyze and even predict the dynamics of collective attention among millions of users to and within social media services. For instance, using mathematical epidemiology, we find that so-called „viral“ videos indeed show very high infection rates and, hence, should be called viral.
Based on joint works with Christian Bauckhage, Fabian Hadiji, and Rastegarpanah.
“Multilayer Networks and Applications”
One of the most active areas of network science, with an explosion of publications during the last few years, is the study of “multilayer networks,” in which heterogeneous types of entities can be connected via multiple types of ties that change in time. Multilayer networks can include multiple subsystems and “layers” of connectivity, and it is important to take multilayer features into account to try to improve our understanding of complex systems. In this talk, I’ll give an introduction to multilayer networks and will discuss applications in areas such as transportation, finance, neuroscience, and ecology.
“Hierarchies of order and order of hierarchies”
The word hierarchy comes from two Greek words hieros (holy) and arkhia (rule) and the concept appeared for the first time in the sixth century as the order of sacred things in Christian theology. Currently the word posseses many different meanings, inter alia: (i) order, i.e. a rank of any objects according to a certain parameter; (ii) relationship of control or domination; (iii) relationship of inclusion; (iv) coexistence of multiple organization levels. Complexity science is intersted in various hierarchies due to universal power laws (eg. Zipf law, Pareto distributions) being a sign of scale invariance and because of self-organization processes in multi-level physical, biological and social systems. Several hierarchical systems will be presented during the lecture. In the first case the evolution starts from a root node and the growth process is driven by rules of tournament selection. A system can be conceived as an evolving tree with a new node being attached to a contestant node at the best hierarchy level (a level nearest to the tree root). The proposed evolution reflects limited information on system properties available to new nodes. The information restrains the emergence of new hierarchy levels.In the second case the evolution starts from a bottom hierarchy level and then next levels are emerging. Therein, two dynamical processes are accounted for: agents’ promotions to next hierarchy levels and degradations to the lowest one. Following the initial stage of evolution the system approaches a stationary state where hierarchies no longer emerge and the distribution of agents at different levels is exponential. The average hierarchy level, the number of links per node, and the fraction of agents at the lowest level are all independent from the system size. However, the maximal hierarchy level grows logarithmically along the number of nodes. Computer simulations of opinion dynamics in hierarchical social groups and co-evolution of hierachical adaptive random Boolean networks will be demonstrated.
“Privacy in networks: Data mining as foe or friend?”
Online networks are great places for sharing data, discovering new knowledge in these data, and acting on this knowledge. Data mining plays a central role in these knowledge-based operations. But who profits, and who may be harmed? One widespread view is that data mining in networks can be instrumental for severe privacy violations. On the other hand, data mining is also expected to be able to empower users. In this talk, I report on our recent studies on (a) helping users manage their communication environment in online social networks and (b) analysing commercial tracking beyond advertising. I consider the applicable notions of networks, the concepts of privacy that is harmed or protected, and the role of data mining. I show how data mining can be a useful building block, but also needs to be extended by more systemic methods such as teaching approaches, in order to empower citizens.
“Networks Everywhere: On Construction of Semi-Structured Heterogeneous Networks from Massive Text Data”
The real-world big data are largely unstructured but interconnected, mainly in the form of natural language text. One of the grand challenges is to turn such massive data into actionable knowledge. In order to turn such massive unstructured, text-rich, but interconnected data into knowledge, we propose a D2N2K (i.e., data-to-network-to-knowledge) paradigm, that is, first turn data into relatively structured heterogeneous information networks, and then mine such text-rich and structure-rich heterogeneous networks to generate useful knowledge. We show why such a paradigm represents a promising direction and present some recent progress on the development of effective methods for construction and mining of structured heterogeneous information networks from text data. We argue that network science is the key at turning massive unstructured data into structured knowledge.
“Community structure in complex networks: genesis, graph spectra and algorithm validation”
Real networks display a modular organization, where modules, or communities, appear as subgraphs whose nodes have an appreciably larger probability to get connected to each other than to other nodes of the network. In this talk I will show that communities emerge naturally in growing network models favoring triadic closure, a mechanism necessary to implement for the generation of large classes of systems, like e.g. social networks. I will show that the number of communities can be inferred by perturbing the adjacency matrix and see how its eigenvectors rotate. Finally I will address the crucial issue of validation, probably the single most important issue of network community detection. If using artificial benchmark graphs could bias methods towards the definition of community implemented by the benchmarks, real networks with metadata may or may not be useful for testing, contrary to general expectations.