The Network of Weblogs and A Review of
Real and Social Network Research
By Matt Langeman
For Msci 620
University of Waterloo
In the last several years, research in the area of real networks has flourished. New understanding about the properties of these networks has opened the doors for continued research, and has generated interest that has overflowed into non-academic audiences through books such as Malcolm Gladwell’s, The Tipping Point, and Albert-Laszlo Barabasi’s, Linked. Meanwhile, the use of weblogs has expanded in the last few years in both academic and non-academic settings. Not surprisingly, this increase in weblog usage, along with the thriving research in real networks, has created a desire to examine the network of weblogs using real network concepts. In the following paper, I will give a history of two fields that can be considered as origins of real network theory: graph theory and social network theory. I will then discuss recent research on real networks, focusing on network properties and modeling techniques. Finally, I will discuss research related to weblogs and give a few suggestions for future research.
History of Graph Theory
In 1736, Leonhard Euler published a paper solving the problem of “The Seven Bridges of Königsberg” (Barabasi, 2003). Königsberg had a river running through it that divided the land into two islands and the mainland. All together, there were seven bridges connecting the various parts of city [Appendix 1 - Figure 1]. The question was whether it was possible to walk a route that crossed each bridge exactly once. In his paper, Euler used the concepts of nodes and edges to prove that it was impossible to find such a route. His solution demonstrated the importance of network structure and inspired future study of graph theory.
In 1852, Fancis Guthrie presented “the four color problem” in the form of asking whether it is possible, using only four colors, to color any map of countries in way that no two bordering countries have the same color (Wikipedia). The problem took until 1976 to be solved, ultimately be proved using a computer. Nevertheless, many terms and concepts of current graph theory were formed in attempts to solve the “four color problem.”
In 1959, Erdos and Renyi defined the idea of random graphs: N nodes, with each pair being connected with probability p. Much of the study of random networks focused on determining “at what connection probability p will a particular property of a graph most likely occur” (Albert, 2002). The concept of random graphs allowed the modeling of larger more complex networks.
Social Network Theory History
In the late 1940’s and early 1950’s, Bevelas and Leavitt conducted studies that examined the effect of communication patterns on the performance and moral of task oriented groups (Bevelas, 1950). In the studies, they examined groups of five participants as they attempted to complete an assigned task. Groups were organized into four possible patterns that determined lines of communication between participants [Appendix 1 - Figure 2]. All of the groups performed the same task: each of the five participants in a group received a card with four of the five symbols: □ ◊ + * o. The group of five was then asked to determine which symbol was on all five cards using only the lines of communication available in their given pattern. Task performance was analyzed based on number of total errors and number of group errors. A questionnaire was used to evaluate morale, with participants being asked to rate whether they liked their job and whether they were satisfied with the job done. Participants were also asked to identify the leader of the group. From these studies, Bevelas and Leavitt found that groups organized in more centralized patterns such as E and F had fewer errors. However, these centralized groups also had an overall lower morale. Participants in peripheral positions expressed both a dislike for their job and a low satisfaction with how the job was performed, while participants in central positions liked their jobs and were satisfied with how it was performed. Not surprisingly, participants in these central positions were more frequently identified as the leaders.
Overall, the studies of Beveles and Leavitt displayed a strong connection between communication patterns and the performance and morale of task-oriented groups. While the groups being studied were small, and the task was fairly simple, these findings piqued interest in the effect of communication patterns on group dynamics.
In 1967, Stanley Milgram conducted a study of real-life social networks to determine how many intermediary people it would take to connect a random person in Omaha, Nebraska to a stranger in Sharon, Massachusetts (Barabasi, 2003). In 1929, a similar idea appeared in the short story “Chains” (Barabasi, 2003). Hungarian author Frigyes Karinthy proposed that every person is connected to every other person by at most five intermediaries. It is unknown were Karinthy got this proposition. Most participants in Milgram’s study expected it to take at least one hundred intermediaries. Surprisingly however, the average number in Milgram’s study was between five and six.
In 1973, Granovetter published “The Strength of Weak Ties” in the American Journal of Sociology. In this paper, Granovetter argued for increased attention to what he labeled “weak ties” in social networks. The label “weak tie” as opposed to “strong tie” was used to distinguish between different types of social relationships. For example, most people would consider themselves much closer to a sibling than to a co-worker. Another way of labeling is that of acquaintance or close friend. The key assertion is that “our acquaintances (weak ties) are less likely to be socially involved with one another than are our close friends (strong ties)” (Granovetter, 1973) Therefore “weak ties are good because they are more likely (than strong ties) to be bridges to other social cliques” (Granovetter, 1973) These bridges expose someone to new information and innovative ideas that he would not run across on his own. This hypothesis is also based on the assumption that the information one knows is similar to that of one’s friends. Granovetter cites empirical evidence supporting the idea that “the stronger the tie connecting two individuals, the more similar they are, in various ways” (Granovetter, 1973) He also uses the theory of cognitive balance, established by Herder (1958) and Newcomb (1961) to explain how strong ties result in groups of people with similar views and feelings, which are less likely to matter between weak ties.
“If strong ties A-B and A-C exist, and B and C are aware of one another, anything short of a positive tie would introduce ‘psychological strain’ into the situation since C will want his own feelings to be congruent with those of his good friend, A, and similarly, for B and his friend, A. Where the ties are weak, however, such consistency is psychologically less crucial” (Granovetter, 1973).
The theory of cognitive balance is used to create what he labels, the Forbidden Triad or the triad that is least likely to occur. This triad consists of three people: A, B, and C where A has a strong tie to both B and C, but B and C have no connection. While acknowledging the statement as an exaggeration, Granovetter assumes that the triad never occurs. Empirical support for this assumption was provided by a study by Davis (1970) of 651 sociograms. The study found that “in 90% of them triads consisting of two mutual choices and one non-choice occurred less than the expected random number of times" (Granovetter, 1973). With the assumption of no Forbidden triads, it can be shown that strong ties can never be bridges.
Real Networks
The study of real and complex networks can be said to combine aspects of graph theory and social network theory. At times the study of real networks can be very mathematical, while at other times in is necessary to understand concepts of social networks in order to better understand network behavior.
The random model developed by Erdos and Renyi “dominated thinking about complex networks for decades after its introduction. Much studied by mathematicians, it was rediscovered and used in various fields faced with complex networks, ranging from sociology to ecology” (Albert 2002). Recently however, the assumption of randomness in complex systems such as the cell, the Internet, and other real networks has been questioned. Through advances in computerized data collection, increased computer power, and decreased boundaries between academic fields, researchers have been able to gain a better understanding of these complex networks.
Properties of Real Networks
Three properties have been identified as being common in real networks: small worlds, clustering, power law degree distribution (Albert 2002).
Small Worlds
As discussed earlier, the small world phenomenon was studied by Milgram in 1967. Milgram’s experiment showed that the network of social relationships is highly connected and that the social distance between any two people is surprisingly small. Further research has determined that the small world phenomenon is present in such networks as the social network of feature-film actors linked by appearance together in films, the electric power grid of the Western United States and the neural network of the nematode worm C. Elegans (Watts 1998).
It should be noted that random graphs exhibit small worlds, so while the existence of small worlds does not eliminate the possibility of randomness, it is a property that is common property in real networks.
Clustering
The concept of clustering can be explained using a simple example in social networks. In social networks, clusters or cliques for friends form in which a person is friends with nearly every other person in the cluster. In terms of graph theory, this would be described as sub-graphs that are complete (or nearly complete). So a graph with k nodes would have k(k-1)/2 edges. In order to quantify the amount of clustering in a network, Watts and Strogatz (1998) proposed the use of a clustering coefficient. For node i, connected to ki other nodes, the cluster coefficient is the number of edges between the ki nodes and the total possible number of edges (ki (ki – 1)/2). The clustering coefficient of the entire network is simply the sum the individual clustering coefficients. In terms of a social network, the clustering coefficient of a person is based on the number of friendships that exist between his friends, divided by the total number of possible friendship combinations that could exist.
It is noteworthy to point out that the clustering coefficient for random graphs is rather small. The probability that two nodes are connected is the same regardless of whether they share a common neighbor. However, Watts and Strogatz found the clustering coefficient for real networks much larger than that of random networks. This discovery is credited as the first indication that real networks have properties that are not modeled by random graphs (Albert, 2002).
Power Law Degree Distribution
Shortly after the findings on clustering, power-law degree distribution was identified as another property found in real networks that is not modeled in random graphs. In a random graph, the degree distribution follows the bell shape of a Poisson distribution. Most of the nodes have the approximately the same degree. However, real networks such as the World-Wide-Web, the Internet, and cellular phone networks have been found to have power-law degree distribution [Appendix 1 - Figure 3, 4]. This means that while there are few nodes that have an extremely high degree, the vast majority has a very small degree.
Modeling Real Networks
Seeing that random networks do not properly model some of the properties frequently found in real networks, researchers attempted to find better modeling techniques. A number of models, called scale-free models, have been created in order to examine the power-law degree distribution found in real networks. Barabasi and Albert (1999) identified two key areas where real networks differed from random ones: growth over time, and preferential attachment. Random network models use a fixed number of nodes and connect nodes with a uniform probability. However, real networks are generally started with a small number of nodes and then grow over time through the addition of more nodes. Furthermore, most networks tend to exhibit some form of preferential attachment, meaning that nodes with high degree are more likely to be linked to be these new nodes than nodes with small degree. Thus Barabasi and Albert argued for shift in network modeling, from modeling topology to modeling network assembly and evolution (Barabasi and Albert 1999). The “rich get richer” model was one of the initial scale-free generative models proposed by Barabasi and Albert (1999). Using it they sought to examine the scale-free nature of the Web and model its growth. In the “rich get richer” model the probability that a link is created to a website is dependant on the current in-degree of that website. Thus, the more links a website already has, the more it will receive in the future. This model generates a power-law degree distribution, but does not generate the clustering that is found in the Web. It also raises questions about the ability of new websites to compete with already well established ones.
Extensions have been made to this model to incorporate other aspects of network growth and evolution. The addition of a new node is not the only way that networks change. Edges can be added or removed at any time. Thus the concept of edge rewiring was proposed to simulate the addition and removal of edges (Albert and Barabasi 2002). Amaral et. al (2000) propose a model that incorporates aging costs and capacity constraints. These growth constraints were used to simulate the changing ability of a node to attract or accept new links. Bianconi and Barabasi (2000) address the competitive aspect of networks by proposing the use of a fitness parameter. The fitness parameter of a node reflects its ability to compete for new links. Thus when a new node connects to nodes already in the network, it would connect to node i based on both the degree and fitness parameter of i. Thus, if a new website is created that has incredible interesting content, it would have a high fitness parameter. This would allow it to gain links more quickly than websites that have existed for a while, but are less “fit”.
An interesting finding was published by Pennock et. al (2002) regarding the power-law degree distribution found in many real networks. They note that “As a whole, the World Wide Web displays a striking ‘rich get richer’ behavior, with a relatively small number of sites receiving a disproportionately large share of hyperlink references and traffic. However, hidden in this skewed global distribution, we discover a qualitatively different and considerably less biased link distribution among subcategories of pages—for example, among all university homepages or all newspaper homepages.” Pennock et. al go on to elaborate that “for some collections of web pages of the same type, we find that the distribution of inbound links departs drastically from a power law at small and moderate k.” These findings suggest that while the “rich get richer” behavior makes it difficult to compete in the large World Wide Web scale, websites are better able to compete on a more localized niche level.
Weblogs and the blogosphere
In the last couple of years, the network of weblogs have received attention from social network researchers. This network, often referred to as blogspace or the blogosphere, offers readily available data with explicit links. Much of current research has focused the size and usage of blogspace as well as the flow of information through blogspace.
Current Weblog Research
Several studies have also focused on the demographics and usage of weblogs. In “A Depiction of Users and Usage of Blogs” Carl found that over 50% of bloggers surveyed were between the ages of 18 and 34 while over 80% of bloggers surveyed were White non-Hispanic. For the study she used a self-selective survey by posting links to the survey at “as many websites frequented by bloggers as possible.” She notes however that “based on the methods used to attract participants, the distribution of participants as a segment of the whole blogging population is in all probability somewhat skewed.”
In another paper “A Genre Analysis of Weblogs,” Herring et. al (2004) “present the results of a content analysis of 203 randomly-selected weblogs, comparing the empirically observable features of the corpus with popular claims about the nature of weblogs”. Features examined include author characteristics as well as a classification “purpose” into the genres: personal journal, filter, k-log, mixed, and other. A “personal journal” was identified as a weblog “in which the authors report on their lives and their inner thoughts and feelings.” Thus, personal journals have few outbound links as most of the content is internal. A “filter” was identified as weblog that posted outbound links to external content. A k-log was identified as resembling a “hand-written project journal in which a researcher or project group makes observations [and] records relevant references.” K-logs contain internal content and also link to external content. “Mixed” weblogs contain both internal content and external links, but are unassociated with a specific project, while “other” was used to classify weblogs of unknown or miscellaneous purpose. Overall, personal journals accounted for over 70% of the weblogs surveyed with filter and mixed accounting for 12.6% and 9.5% respectively. These findings are interesting because weblogs have been said to generate interactive conversation that encourage the flow of ideas and information. This study suggests that nearly 70% of weblogs do not actively participate in these conversations. This does not mean that conversations do not take place. Considering the rapid growth of weblogs 30% is still a significant. In fact, these findings could indicate that the definition of a weblog has expanded as more weblogs are created with different purposes.
Weblogs have frequently been credited with turning ideas or news into epidemics. In “Implicit Structure and the Dynamics of Blogspace,” Adar, Zhang, Adamic, & Lukose examine “information epidemics” and attempted to determine the paths used in the spread of the information. Epidemic profiles were created by using data from blog tracking sites such as www.blogpulse.com, www.blogdex.com, and www.daypop.com. These cites list URLs that are most frequently cited by the blogging community. These citations, their estimated timestamps, along with several parameters related to blog similarity, were used to develop a ranking system that determined the possible “infection route” of the information. Another paper, “On the Bursty Evolution of Blogspace,” focuses on the tendency of blogspace to show bursts of activity related to a specific event or topic. The concept of a “time graph” is introduced in order to study graphs as they evolve in continuous time.
The power-law nature of blogspace has also been examined (Shirky, 2003; Kottke, 2003). Indeed, the network of weblogs exhibits power-law properties related to inbound links and traffic. The term A-list blog has been used to classify the weblogs that are at the top of this power-law curve. Shirky notes that inequality occurs “because it is a reliable property that emerges from the normal functioning of the system.” The existence of personal preference, combined with the positive feedback loop created by a person’s recommendations “makes stars inevitable.” In terms of real network modeling, the personal preference relates to the “fit get rich” mechanism, while the positive feedback loop is the “rich get richer” mechanism.
Future Research
It appears that an examination of power-law properties of weblogs within the same topical category has not been performed. It is possible that, as with more general websites such as company or university home pages studied by Pennock, when examined within their topical network, weblogs do not exhibit the strong power-law properties that are found in the network as a whole.
In general, there appears to be a lack of research specific to clustering property of the blogosphere. For research on the blogosphere as a whole, the use of scale-free network modeling has proven to be valuable. However for understanding clusters, I suggest that we reexamine some of the social network theory used in the study smaller scale groups. In particular, I would like to reexamine Granovetter’s “Strength of Weak Ties” as it pertains to clustering in the blogosphere. While it has been noted that over 70% of the blogosphere consists of personal journals that do not have outbound links, there is still a significant number of weblogs that much more interactive. There is evidence that clusters exist within the weblog network, forming topical-based communities (Herring, 2005). According to Granovetter’s theory of weak ties, it is the bridges between these clusters that allow for the spread of new and innovative ideas. In the physical social network studied by Granovetter, links consisted of friendships and acquaintances between people. These two link types were then classified as strong and weak respectively. What made Granovetter’s theory so interesting is that he argued that most bridges to other social networks were weak ties, making weak ties more useful than strong ties in certain situations. In the world of the weblog network, links cannot be classified as strong or weak in the same way as personal relationships. Links in the weblog world can be one-way and may or may not be reciprocated. It may be the case that a person receiving a link does not know of its existence. It can also be argued that the degree of personal commitment associated with linking to a person’s weblog is much less than becoming close friends with that same person. Thus links between weblogs and bridges formed between clusters do not need to follow the theory of cognitive balance argument used by Granovetter. It would therefore appear that bloggers can link more freely and create more bridges between clusters. Formal studies are needed in order to determine both, whether linking occurs more freely and whether bridges between topical-based clusters is a common occurrence.
Hypothesis 1: Webloggers will link to people that do not fit the demographic profile of their normal friends.
Hypothesis 2: Topically-based clusters diverge from the power-law properties found in the blogosphere as a whole.
Hypothesis 3: Topically-based clusters of weblogs exist and are based on both one-way and reciprocal linking.
Hypothesis 4: Topically-based clusters exhibit the small world phenomenon meaning that the clusters can be connected in a small number of steps (suggestion of 3).
While hypothesis 1 would not fit cleanly into a single study along with hypotheses 2 through 4, it would be an interesting result that could have implications on the number of bridges found between clusters. In order to conduct studies to test these hypotheses, one would select a core number of weblogs chosen at random from a sample taken from a large number of different weblog repositories. The genre of personal journals can be eliminated from the sample, as they would not be part of any type of cluster. In a method similar to Herring (2005) the networks of weblogs surrounding the initial core would be collected. Ideally, for hypothesis 1, the authors of the weblogs could be contacted to determine the demographics of the authors and the demographics of their friends. Hypothesis 2 through 4 would not require contacting the weblogs authors. Instead, detailed data analysis would be conducted on networks to determine the existence of clusters and bridges.
Conclusion
The increased research and understanding of real networks along with the popularity of weblogs has created opportunities for future study. In particular, research on clustering within the weblog network could lead to better understanding of the power-law properties within the network as well as the existence and impact of bridges between clusters.
Figure
1: Königsberg Bridges (Barabasi, 2003)

Figure 2: Bevelas and Leavitt Group Patterns

Figure 3: Power Law Curve (http://www.kottke.org/03/02/weblogs-and-power-laws)
Figure
4: Power Law Curves (Barabasi, 2003)Adar, E., Zhang, L., Adamic, L. A., and Lukose, R. M. (2004). Implicit Structure and the Dynamics of Blogspace. Workshop on the Weblogging Ecosystem, In: 13th International World Wide Web Conference.
Albert, R., and Barabasi, A. (2002). Statistical mechanics of complex networks. In: Review of Modern Physics, 74, 47.
Amaral, L.A.N., Scala, A., Barthelemy, M. and Stanley, H.E. (2000). Classes of small-world networks. In: Proceedings of the National Academy of Sciences of the U.S.A., 97, 11149-11152.
Barabasi, A-L, and Albert, R. (1999). Emergence of Scaling in Random Networks. In: Science, 286, 509-512.
Barabasi, A-L (2003). Linked: How Everything Is Connected to Everything Else and What It Means. Plume, New York.
Bavelas, A. (1950): Communication patterns in task oriented groups. In: Journal of the Acoustical Society, 22, S. 271-282.
Bianconi, G. and Barabasi, A-L (2000). Competition and multiscaling in evolving networks, cond-mat/0011029.
Carl, C.R. (2003). Bloggers and their Blogs: A Depiction of the Users and Usage of Weblogs on the World Wide Web. Georgetown University, Washington D.C.
Davis, James A. 1970. Clustering and Hierarchy in Interpersonal Relations: Testing Two Graph Theoretical Models on 742 Sociomatrices. American Sociological Review, 35:843–51.
Granovetter, M. S. 1973. The strength of weak ties. American Journal of Sociology, 78(6): 1360-1380.
Granovetter, M. S. 1982. The strength of weak ties: A network theory revisited. In Marsden, P. V., and Lin, N. (Eds.), Social Structure and Network Analysis: 105-130, Beverly Hills: Sage.
Gruhl, D., Guha, R., Liben-Nowell, D., Tomkins, A. (2004). “Information Diffusion
through Blogspace.” In: Proceedings of the 13th International WWW Conference, 491-501.
Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley.
Herring, S. C., Scheidt, L. A., Bonus, S. and Wright, E. (2004), Bridging the Gap: A Genre Analysis of Weblogs. In Proc. of the 37th Hawaii International Conference on System Sciences (HICSS'04), IEEE Press.
Herring, S. C., Kouper, I., Paolillo, J. C., Scheidt, L. A., Tyworth, M., Welsch, P., Wright, E., and Yu, N. (2005). Conversations in the blogosphere: An analysis "from the bottom up." Proceedings of the Thirty-Eighth Hawai'i International Conference on System Sciences (HICSS-38). Los Alamitos: IEEE Press.
Kottke, J. (2003). Weblogs and Power Laws. 02/09/2003. http://www.kottke.org/03/02/weblogs-and-power-laws [accessed 07/01/2005]
Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. (2003). On the bursty evolution of blogspace. In: Proceedings of the 12th International WWW Conference, 568-576.
Newcomb, T.M. 1961. The acquaintance process. New York: Holt, Reinhart & Winston.
Shirky C. (2003), Power Laws, Weblogs and Inequality, Networks, Economics and Culture Mailing List, 08/02/2003, also available on: http://shirky.com/writings/powerlaw_weblog.html [accessed 07/01/2005]
Pennock, D.M., Flake, G.W., Lawrence, S., Glover, E.J., and Giles, C.L. (2002). Winners don't take all: Characterizing the competition for links on the web. In: Proceedings of the National Academy of Sciences, 99, 5207-5211.
Watts, D.J., Strogatz, S.H. (1998). Collective dynamics of ‘scale-free’ networks. In: Nature, 393, 440-442.
Wikipedia. Graph Theory. http://en.wikipedia.org/wiki/Graph_theory. [accessed 07/01/2005]