Episode 2: Is it really a Small World? The truth about the network
Updated: Jan 19, 2021
What's the rumpus :)
I'm Asaf Shapira and this is NETfrix.
I practice network analysis for years and founded the first podcast in Israel that tackles the fascinating world of network science. Networks are the building blocks of our universe and we encounter them everywhere.
That's why understanding networks is essential to understand our world and our data. If networks are everywhere, then to miss out on the chance to learn about them seems like border line carelessness. So good for you! Let's begin.
In this episode we will learn about network laws through the fascinating history of network research. We will see where they got it right, where they got it wrong, and what were the implications of those early discoveries, which were often ahead of their time.
Stanley Milgram is known for his groundbreaking experiments in social psychology in the 1960's. His best-known study, and probably one of the most famous experiments in the world, was the psychological experiment in which Milgram showed that given the right conditions, normative people could be made to blindly obey authority. This is how Milgram tried to answer the question - what made the normative German nation the greatest collaborator with the Nazis.
Milgram has done many experiments that each of them deserves an episode, but the experiment we will focus on is a famous and groundbreaking experiment that left its mark on network research. I guess many of you have heard of it. But this time, we will take a look behind the scenes of the experiment and reveal some surprising and lesser known discoveries in it.
Without too many spoilers, we will find that Milgram did not tell us the whole story and even tried to suppress some of his findings. In this episode we'll show that ironically those hidden findings will amount to maybe the biggest discovery in network science. A much more significant discovery than what Milgram was aiming for.
In the military, whenever we met with someone from a different unit, whom we did not know, we would usually start the conversation by looking for a mutual acquaintance we both knew. We called this habit "the cracking game" after a popular TV show in Israel in the 80's. In fact, back then there was only one TV Channel, so every show was popular ...
The same curiosity about unknown ties has probably urged Stanley Milgram and his research partner, Jeffrey Travers, to set out trying to examine the structure of the human network in an experiment they named the "small world problem."
The two researchers have initiated chain letters between random people and checked how many links each chain letter consists. The aim was to examine the length of the chains and thus provide an answer to the question: Is our world a small world and if so, what is its diameter? The conclusions of the experiment are now already part of the pop culture. I guess most of us are familiar with the concept of "6 steps" or 6 degrees of separation, which became famous thanks to this experiment. A popular application of this concept is at the base of the game "6 degrees of Kevin Bacon" where participants give a name of a movie actor and are supposed to find a connection between the actor and the famous actor Kevin Bacon that won't exceed 6 links. Connection, in this context is participation in the same movie. In some places actors were rated according to their "Kevin Bacon" number. Actors who played with Bacon in the same movie got a score of 1. Actors who played with those actors got a score of 2 and so on.
The prevailing assumption is that each of us is 6 steps away from any other person in the human network and that our world is indeed a small world.
But is it really?
Let's dive into the details of the experiment.
Milgram, from New York University, and Travers from Harvard University, randomly selected 2 groups of people from whom the chain would begin: One group from Boston and another group from Nebraska. Boston was chosen because the target of the chain letters was a randomly selected Boston financier. One of the objectives was to compare the length of the chains between local city participants and distant participants. The later were represented by the Nebraska Group. Nebraska was chosen because it was a "distant hole" according to Milgram.
The chain letter that was sent to the 2 groups included a guideline: The letter can only be sent to a person whom the sender knows personally (on a first-name basis of or a mutual meeting) and who in the sender's opinion has a chance to advance the letter closer to the chosen destination.
Each submission of the letter to the next link in the chain was backed with a letter sent to Harvard University for follow-up.
The published results were mind blowing: The chain letters that reached their destination did so with an average chain length of 5.2 steps. Even shorter than the well-known "6 degrees" in pop culture.
While this was not a world-wide network but only an intra-American network, "six" still sounds impressive even today. It was certainly a much smaller number than even Milgram imagined.
But what else was discovered in the experiment and is not common knowledge?
Since the destination was in Boston, it's probably less surprising that the length of the chains coming out of Boston were significantly shorter than the length of the chains from Nebraska. In this case, the geographical proximity also created a network proximity.
A little more surprising was the results of different strategies participants used to get to the target. The target was a financier so some of the senders tried to produce a chain based on professional acquaintance, meaning advancing the letter through financiers they knew. This may be a little surprising, but these chains were not significantly shorter than average.
But another interesting finding was that a small number of participants played a key role in the network. Of the hundreds of participants in the experiment, there were only three persons in the network that took part in about 50% of all chains. On average, most of the participants took part in only 2 chains or less.
Why are these revelations so significant? To answer this, we need a brief history of network science. In the 20th century, the field of network research was mostly anecdotal in nature much like Milgram's experiment, meaning a collection of interesting discoveries that did not crystallize into a broad and meaningful theory.
One reason for this was the lack of data. In the pre-Internet age, the researchers' ability to collect network data was mostly manual and thus very limited. Drawing global conclusions or even do comparative analysis was very difficult. The collection process did not allow for big data-based insights. Because of those limitations, I guess it was hard to even imagine reasons to conduct such a research. In this context, Milgram's achievement to create a network of hundreds of participants across the USA, was an anomaly, and part of his genius.
An example for the more common type of network analysis conducted in the last century can be found in the studies of Jacob Moreno in the 1930's.
Like Milgram, Moreno too came from the field of social psychology. His project dealt with network mapping of classmates in Brooklyn from kindergarten to eighth grade. The relationships or links in this network represented which pupil wants to sit next to which pupil. The networks he drew were called sociograms and he was among the first to make use of them. The advantage of the sociogram was in its clear visualization of the friendship ties.
The visualizations revealed two interesting phenomena:
The first one is the stars that were discovered in the network. The numbers ranged from 2-4 popular pupils, or stars, that many wanted to be in contact with, compared to most of the other pupils that had only a single connection or none. Another phenomenon that visualization has revealed is the crystallization of communities , meaning dense clusters in the network. Using Moreno's sociograms, we can see that the clustering in the network climaxed in the third and fourth grades. The sociograms of these classes have shown that the nodes in the network were divided into 2 distinct communities with a single relationship between them. Moreno noticed that the characteristic that defined each community was gender.
Although Moreno's studies have dealt with smaller networks than Milgram's, the same recurring phenomena can be seen in both studies, and in many other studies that will follow.
The first one, is that there are areas in the network where the players or nodes are more interconnected. In Moreno's case the reason was gender and in Milgram's case – geographical proximity.
In hindsight, this was a significant discovery in the field of network science: The edges in the network tend to create dense clusters or communities and are not dispersed randomly. This has many implications which we will go through in the episode dedicated to communities.
The second common finding of the two studies is that the roles of network players are not random either. There are a few central nodes with lots of connections, but most nodes have only few connections. As we now know, network science has a lot to do and to contribute to many fields of research but unfortunately, interdisciplinarity was not common practice for most of the 20th century. There were only a few interfaces between the social sciences of sociology / psychology and the exact sciences of mathematics / physics. The limited access to the data can explain why so little was known about how networks form and operate.
Following this dis-connectivity between the disciplines, Erdos and Renyi, two mathematicians in the 1950's, tried to build a model that would describe the forming of a network. We must stop and say a few words about Erdos because apart from being an avid network researcher he was a network phenomenon by himself. Erdos, of Hungarian descent, was very fond of research collaborations, and many scholars found themselves academically connected to Erdos, which in this context means writing a joint paper.
Like Kevin Bacon, Erdos too got a similar network game named after him, this time, in the academic field. In this version, researchers got a score according to their Erdos number. That is, the number of steps or links between them and a collaborative paper with Erdos. A low Erdos number is considered a badge of honor to any researcher, particularly mathematicians.
Now back to the model: The basic premise of Erdos and Renyi's model was that network edges or links are formed at random.
This means that each node in the network has an equal chance of being connected to any other node. This assumption has led to another assumption that network links will be normally distributed. A normal distribution is also called a bell distribution based on its shape, since it has narrow margins and a peak in its center. An example for such a distribution is the heights of people in the same age group.
For example, according to Wikipedia, the average height of 20-year-olds in the US is about 5foot 9. It is assumed that in this age group, there are some that are slightly higher or slightly shorter, but the majority is expected to be in this height range. Without this assumption, it would not have been possible, for example, to build chairs that would suit most people.
The implicit assumption in Erdos and Renyi's model is that the network is also distributed as follows: There will be few people who will have many connections and few people who will have few connections (and those two groups would fit the Bell's curve edges), but most people will fit the average's range of connections. Truth be told, that was also Milgram's assumption when he initiated his experiment. He assumed that the length of the chains would be normally distributed: there would be some short chains and some long ones, but most of them would be average.
This mathematical assumption lasted throughout the 20th century until the beginning of the 21st century. The age of giant networks.
The Internet revolution and the giant networks (the World Wide Web, or facebook, for example) led to a significant development in the study of networks and to the understanding that networks are not formed by random as people used to think .
Maybe by chance, it was 40 years after Erdos, that another Hungarian, Albert Laszlo Barabashi, proposed a new model. His model was based on his groundbreaking research that is still used today as a basis for understanding networks. So what was Barabashi's research about?
Barabashi tried to understand the structure of the Internet by way of studying the links between pages on the Internet.
The idea behind web surfing is the ability to move from page to page through links that the website offers to on-line users. these links were the connections that Barabashi traced.
To do this, Barabashi used a crawler that mapped the network by crawling from page to page through these links. By the way, Barabashi was not the only one to do so. At the same time a company called Google did a similar thing with a small difference: Google also indexed the content of the pages. As Barabashi himself will testify later-on, this small difference meant that the return he got for his efforts was also slightly smaller than that of Google.
Still, his contribution to network science had been tremendous and like many things in the network field – it was not intuitive.
When Barabashi delved into the data he discovered that the internet was not as messy or random as we might have imagined. According to Erdos and Renyi, we should have assumed that there will be a few pages that have many links to them, few pages that are negligible, but most pages should be moderately linked. This is the expected result of randomness.
Barabashi revealed that the links are distributed differently. A distribution called Power Law.
The idea behind the power law distribution is that there are the few that holds the many and the many that holds a few. In popular culture, a similar idea is known as the 80/20 concept or Pareto law which claims for example that in a company 80% of the work is done by 20% of the employees. The problem with Pareto law is that it often relies on intuition and that's why 80% believe that this law is not credible and 20% do not think of it at all because they are busy doing the work of the other 80%. Pareto deals with 80/20 but power law is much more extreme and distributes the world into approximately 1 / 99.
It may sound extreme, but it's a fair representation of reality as we shall see.
One of the best-known examples to the application of the power law is in the field of economics:
Every week or so, in the economy section of our news sites, we'll stumble on an article about the huge social gap. Each time the article will present data from a different state or country, indicating that a small percentage of society holds most of the wealth. The most prominent example these days is the FAANG companies (Facebook, Apple, Amazon, etc.) that have a huge market share, even though there are less than one percent of the number of companies in the world.
For a more picturesque example that will help us imagine a power law graph we will take a look at the animal's kingdom: Let's place all the animals on earth on a graph along the X-axis according to their size, which will be the Y-axis. Each animal will be a column in the graph.
By the way, did you know that the largest animal is the blue whale and there are about 10,000 blue whales in the world?
Great, then we will place the ten thousand whales at the beginning of our graph as our first ten thousand columns and after that we will place all the African elephants and continue to all the other animals through cats, fish, cockroaches, flies and even bacteria.
Assuming all the animals will stand still, we will find that we have a graph that has a very small percentage of very high columns (these are the whales and elephants) but most of the columns in it will be tiny (say mosquitoes and bacteria) and these columns are called the long tail of the power law.
Why is it called a long tail? To continue the animals' metaphor, we will notice that the power law graph bares some similarity to a dinosaur and not just any dinosaur - a Brachiosaurus. From a side view. When standing upright.
The large columns will form the head of the dinosaur and all other tiny species will form its long tail. Very long tail.
To illustrate the difference between a power law and a normal distribution, imagine that the animals were distributed in the shape of a bell. This would mean that the sum of bacteria and mosquitoes in the world should be around the sum of whales and elephants, since the bell edges are supposed to be similar in size.
So, how will the power law look like in a network? Let's put all the nodes in the network on the X-axis and sort them in a similar fashion as we did with the animals. We'll put the most connected nodes to the left and then on a downward scale to the right, making the Y-axis the number of links each node has. What we'll get is a long tail.
There seem to be a few central nodes in the network (called "Hubs") that have a lot of connections, while most nodes in the network have only a few connections, and these nodes are called the "long tail" of the distribution.
On the Internet, for example, these Hubs would be pages like Google. In contrast to these huge websites, we'll find that most pages on the Internet have only one or two links or no link at all. Most of these long-tail pages will be in a part of the Internet called the deep web, which is estimated to contain about 90% of the Internet. Barabashi's discovery contributed to a flood of papers that empirically show, time and time again, that different real-world networks exhibit the power law. When I say real-world networks, it means network's data that we find in reality or nature, and not as a result of theoretical mathematical models. As can be understood from the examples so far, the power law distribution isn't confined to networks, but it is such a fundamental Characteristic of networks, that it has earned the title "The network’s number one law". Therefore, we will also dedicate a separate episode to power law in which we will also discuss why power law exists and how understanding it can help us.
So, does all this remind us of anything? Apparently here lies the explanation for the stars that Moreno discovered in his sociograms and the major players that Milgram discovered in his chains.
Assuming the classroom distribution was a normal distribution, we should have expected the number of popular pupils to be more or less equal to the number of the unpopular ones — that is, expecting a symmetrical equality between the edges of the bell distribution. But that is not the case.
The number of unpopular pupils is 3 to 6 times greater than the popular one. Presumably, if we would have sampled all the schools, we would have probably got an even longer tail.
So now, back to Milgram. Where is the long tail reflected in his experiment?
The most obvious case for a power law in his experiment is that out of the hundreds of participants, there were only 3 central players in the network that were responsible for 50% of the chains. All the others had 2 or less connections. The above 3 are equal to one percent of the data, which is compatible with power law.
Now, let's look at the distribution of chains in Milgram's experiment: 64 out of the 217 chains sent reached their destination, which means that almost 30% of the chains were successful. That also means that 70% were not successful.
But still, 30% is a pretty high number for a power law distribution. If humanity's network is a power law distribution, then we should have expected to get very few successful chains. About an order of magnitude less than 30. Also, the percent of broken chains should have been much higher. Why? Since according to the power law, there should be few people with many connections and the vast majority of people with only a few connections. So, we should expect that at least 90% of the chains would be broken for lack of connectivity. So, did Milgram had the upper hand in the battle against the power law? No such thing. Instead of using the original number of participants in the experiment (which was 296), Milgram chose to see the glass half-full. He made reference only to the number of participants who actually sent a letter (or the active participants i.e., 217).
This way, Milgram reduced the count by about 80 participants, who didn't send a single letter. In practice, if we were to consider those who chose not to send a letter, the number of successful chains drops from 30% to about 20%.
What Milgram did there was to reduce the sample size.
As we reduce the size of the sampled data, we are more likely to get a bias in the form of a normal distribution. In this case, the network's sample was reduced by almost a third, cutting off the long tail that is typical of a power law.
But wait, maybe Milgram was right? Why even consider participants who chose not to participate? After all, they seem to be just "noise" in the data and there is no point counting them. We will learn nothing by including them. At least that was Milgram's understanding.
For now, let's just say that sample size is something to remember when looking for a power law, so we will discuss it in more detail in the power law episode. But let's move on. Let's say he was wrong to downsize the sample. Still, 20% still sounds high. As we said, in a power law distribution, we would expect the number of letters that reached their destination to be significantly lower than 20%. More around the single digits. So, is the human network a power law or not? Or in the words of Shakespeare: "To be a power law or not to be power law?"
Of course, it's a power law. Do not be tempted by people who will offer you other distributions.
So how can this be?
The answer comes from a surprising source - Milgram himself. But not from this experiment. But from a previous experiment.
The Hidden Kansas Experiment.
In 2000, Judith Kleinfeld, a social psychologist from the United States, tried to interest her students in performing social experiments and which experiment captures the imagination more than Milgram's "Small World" experiment?
Her fantasy, as she described it, was not only to recreate Milgram's experiment but to recreate the original cast of participants through their descendants. And so, dedicated as she was, she went to the Yale University archives, where Milgram's research was stored, to find the notes of the original experiment.
While rummaging through the crates she discovered an earlier experiment conducted by Milgram that had never been published. In this experiment, about 60 letters were sent from a selected group of people in Wichita Kansas (again, a town referred to as a "hole" by Milgram) to a random destination: a student at Cambridge.
The only mention she found of the hidden experiment, outside the archives' boxes, was in an interview with Milgram. Milgram didn't name the experiment per se but just mentioned it as an anecdote. He said that in one of his early experiments, one of the letters reached its destination in a record time of 4 days. What Milgram did not mention about this experiment, and Kleinfeld horrifically found out, is that apart from this letter, only 2 other letters arrived at the destination, resulting in only 5% of successful chains.
A phenomenon that has recurred in both experiments was that of highly connected players in the network. Out of the 3 letters that reached their destination, two of them passed through the same person.
This will not surprise the power law disciples among us:
In the Kansas experiment we can clearly see a graph with one high column, indicating a very successful chain, 2 smaller columns representing chains with medium success and dozens of very small columns of broken chains, some without sending even one letter.
We can also see that in this network there is one central player and the other players have only few connections or none, a result that pretty much resembles the other experiment. So compared to Milgram's famous experiment, the Kansas experiment was not so successful. But maybe this one failure isn't indicative? Maybe the truth lies (pun intended) in the famous experiment? Fortunately, Kleinfeld was not the only one who tried to recreate Milgram's experiment. Many have tried to recreate it, so we have room for comparison. But wait, why did so many people (including yours truly) tried to recreate it? What is so appealing in the "Small World" experiment? Maybe the concept of Small World echoes something we all want to hear.
As in the "Cracking" game in the military we mentioned at the beginning of the episode, a shared acquaintance between two soldiers makes it possible to establish trust. Shared acquaintance makes us feel that we are on the same side. A small world basically means we are all part of the same family.
But disappointingly, the failure of the vast majority of the chains to reach their destination, was serial. The results of the recreations of the experiment over the years, were even worse than the result of the hidden experiment of Kansas. The completion rate was so low, that most researchers tended to say that no conclusions could be drawn from it at all.
But then came the internet. In an age where anyone can send a message to anyone, is experimentation still required? Isn't the Internet by itself a proof that our world is a small world?
An experiment conducted by a journalist named Wilson in 2000 involved starting a chain of emails from six people from across the United States, from Honolulu to Oregon, targeting a programmer in Oregon, named Tara.
Unlike previous experiments, Wilson allowed participants to send as many emails they'd like. The participants took advantage of this and thousands of emails were sent, copies of which were also sent to Wilson, as the supervisor of the experiment.
But no email has reached Tara.
A more academic experiment, which also involved sending emails, was conducted in 2003 by Duncan Watts.
Watts has co-written the famous paper about Small World networks in 1998 and literally wrote the book about Six Degrees of separation, titled: "Six degrees: The science of a connected age".
In his experiment, out of about 100,000 people who signed up for the experiment, only 25% started a chain of emails. Of these , only 384 reached their destination.
That's only about 1.5% of successful chains. But wait - The results published by Watts only considered chains that had already started, i.e., only the active participants. If we'll count also those who registered to participate but didn’t start a chain, we would get even a much lower percentage of success.
How can this be explained?
Kleinfeld, who is a professor for the last 30 years, and a disappointed fan of Milgram, chose oatmeal as her analogy, which is definitely a radical choice for a metaphor.
Her contention was that contrary to the image of a dense porridge as Milgram saw the human network, she argued that the world is in fact a collection of lumps in an unmixed porridge. The lumps are dense, or highly inter-connected, but the connections between them and the other lumps are weak to non-existent.
I have been trying to keep my distance from oatmeal (and drill sergeants) since basic training, so with your permission, I won't use the oatmeal analogy anymore and for another good reason: it is inaccurate. So, let us move on to a more professional terminology than "lumps" in the network.
The term is called "Connected Components".
To illustrate what a connected component is, we'll imagine a network consisting of two blue nodes connected to each other, and next to them- but not connected to them - two red nodes connected to each other.
Now there are two connected components in this network: red and blue.
Connected components are the "islands" in the network. The nodes within them are connected solely to the nodes in the component.
Don't confuse connected components with communities (or clusters).
In communities, as we have seen in the Moreno's case, most of the nodes' connections are within the community but there are also few connections to nodes outside the community. Unlike a community, a connected component has a defined boundary, like an "island."
When Kleinfeld described the network as consisting of "lumps of porridge" she meant that there were large number of such connected components or "islands" that are intra-connected but not inter-connected. According to her assumption, the world is not small but is made up of distant galaxies.
In contrast, the