Skip to content

Centrality Algorithms: Degree Centrality & Closeness Centrality & Betweenness Centrality

About 1257 wordsAbout 4 min

DistributedBig Data

2024-12-18

Centrality algorithms are used to understand the influence of specific nodes in a graph and their impact on the network, which can help us identify the most important nodes.

This article will introduce the following algorithms:

Degree Centrality

Used to measure the number of relationships a node has. The larger the value, the higher its centrality.

  • Input: G = (V, E).
  • Output: Each node and its degree centrality value.

Implementation Principle

CD(Ni)=Ndegreen1C'_D(N_i) = \frac{N_{degree}}{n - 1}

Where:

  • NdegreeN_{degree} represents the degree of the node.
  • nn represents the number of nodes.

This formula has been standardized.

Adaptation For Heterogeneous Graphs

  • This indicator calculation does not involve attributes, only the degree of the graph structure.
  • Only the degree under the same label is calculated.

Closeness Centrality

Used to discover nodes that can efficiently propagate information through subgraphs, The higher the value, the shorter the distance between it and other nodes. This algorithm can be used when you need to know which node has the fastest propagation speed.

  • Input: G = (V, E).
  • Output: Each node and its closeness centrality.

Implementation Principle

The indicator for measuring the centrality of a node is its average distance to other nodes. The closeness centrality algorithm calculates the sum of its distances to other nodes on the basis of calculating the shortest path between all node pairs, and then calculates the inverse of the result.

C(u)=1v=1n1d(u,v)C(u) = \frac{1}{\sum_{v=1}^{n-1}d(u,v)}

Where:

  • uu represents a node.
  • nn represents the number of nodes in the graph.
  • d(u,v)d(u,v) represents the shortest distance between another node vv and node uu.

It is more common to normalize the calculation result to represent the average length of the shortest path, rather than the sum of the shortest paths. The normalization formula is as follows:

Cnorm(u)=n1v=1n1d(u,v)C_{norm}(u) = \frac{n-1}{\sum_{v=1}^{n-1}d(u,v)}

Adaptation For Heterogeneous Graphs

  • Only calculate nodes with the same label.
  • Actually only calculate the close centrality in each connected subgraph.

Wasserman & Faust algorithm

This algorithm is a variant for non-connected graphs.

CWF(u)=n1N1(n1v=1n1d(u,v))C_{WF}(u) = \frac{n-1}{N-1}\left(\frac{n-1}{\sum_{v=1}^{n-1}d(u,v)} \right)

Where:

  • uu represents a node.
  • NN represents the total number of nodes.
  • nn represents the number of nodes in the same component as uu.
  • d(u,v)d(u, v) represents another node The shortest distance from vv to uu.

Betweenness Centrality

Used to detect the degree of influence of a node on the information flow or resources in the graph, usually used to find nodes that bridge one part of the graph with another.

  • Input: G = (V, E).
  • Output: Each node and its betweenness centrality value.

Implementation Principle

BC(v)=sutσst(v)σstBC(v) = \sum_{s \neq u \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}

Where:

  • vv represents a node.
  • σst\sigma_{st} represents the sum of the shortest paths between nodes ss and tt.
  • σst(v)\sigma_{st}(v) represents the number of shortest paths between ss and tt that pass through node vv.

In simple terms, betweenness centrality measures the node vv by the proportion of the shortest paths between point pairs that pass through node vv. Importance.

The following figure shows the steps to calculate the betweenness score.

Betweenness centrality calculation example
Betweenness centrality calculation example

The calculation process for node D is as follows:

Shortest path node pairs through DTotal number of shortest paths between node pairs ppPercentage of the number of shortest paths through D p(u)p\frac{p(u)}{p}
(A, E)11
(B, E)11
(C, E)11
(B, C)2 (B->A->C and B->D->C respectively)0.5

So according to the formula, the betweenness score of node D is: 1 + 1 + 1 + 0.3 = 3.5.

Algorithm Steps

Different from linear sequential programming calculations, the algorithm given here is based on Pregel's message propagation method for calculation. It is only for reference.

Step 1: Calculate the shortest path and the number of shortest paths between all pairs of points

  1. Initialization
    • For source node ss, path count σs=1\sigma_s = 1, path length ds=0d_s = 0.
    • For other nodes, path count σ=0\sigma = 0, path length d=d = \infty.
  2. Message propagation Assume that node uu sends a message to node vv.
    • Node uu message sends: (du+1,σu)(d_u +1, \sigma_u), that is, path length and number of paths.
    • Node vv message update:
    • If dnew<dcurrentd_{new} < d_{current}: update the shortest path length dv=dnewd_v = d_{new}, set the number of paths σv=σu\sigma_v=\sigma_u.
    • If dnew=dcurrentd_{new} = d_{current}: It means that another path of equal length has been found, so the number of paths is increased by σv+=σu\sigma_v += \sigma_u.

Step 2: Calculate the contribution value of each point

contribution=σsv×σvtσstcontribution = \frac{\sigma_{sv} \times \sigma_{vt}}{\sigma_{st}}

Suppose we have three nodes: source node ss, intermediate node vv and target node tt. What we need to calculate is: the proportion of the path passing through node vv to all shortest paths from ss to tt. The σsv×σvt\sigma_{sv} \times \sigma_{vt} in the formula is the number of shortest paths between ss and tt that pass through vv, which is the combination principle of the shortest path from ss to tt. The shortest path of tt is considered as the combination of the path from ss to vv and the path from vv to tt.

betweenness_centrality.py
def calculate_betweenness_centrality(source, all_nodes):
    BC_s = 0

    for t in all_nodes:
        if t == source:
            continue

        for v in all_nodes:
            if v == source or v == t:
                continue
            sigma_sv = get_shortest_path_count(source, v)
            sigma_vt = get_shortest_path_count(v, t)
            sigma_st = get_shortest_path_count(source, t)

            if sigma_st == 0:
                continue

            contribution = (sigma_sv * sigma_vt) / sigma_st

            BC_s += contribution

    return BC_s

Adaptation For Heterogeneous Graphs

  • This indicator calculation does not involve attributes, only the degree of the graph structure.
  • Only calculate the degree under the same label.

Summary

This article introduces three different centrality algorithms, each of which identifies important nodes in the graph from different perspectives.

  • Degree centrality algorithm

    The importance of nodes is evaluated by calculating the in-and-out degree of each node. Nodes with high degree centrality mean that they are connected to more nodes.

  • Closeness centrality

    The importance of nodes is measured by calculating the average distance to other nodes. Nodes with high closeness centrality mean a more central position in the graph, and the path from it to all other nodes is shorter, which will be faster if messages are propagated.

  • Betweenness centrality

    The importance of nodes is measured by calculating the probability that the shortest path between other nodes passes through itself. It is usually used to detect the degree of influence of nodes on information flow or resources in the graph. Nodes with high betweenness centrality mean that they are important bridges between two groups of nodes in the graph.




Changelog

Last Updated: View All Changelog
  • fix(docs): text typo

    On 1/22/25
  • improve(docs): delete extra whitespace and blank lines

    On 1/21/25
  • improve(docs): add summary

    On 1/9/25
  • fix(docs): fix text typo

    On 12/30/24
  • fix(docs): optimize articles

    On 12/27/24
  • feat(docs): add new english articles

    On 12/18/24

Keep It Simple