Centrality algorithms are used to understand the influence of specific nodes in a graph and their impact on the network, which can help us identify the most important nodes.
This article will introduce the following algorithms:
Used to discover nodes that can efficiently propagate information through subgraphs, The higher the value, the shorter the distance between it and other nodes. This algorithm can be used when you need to know which node has the fastest propagation speed.
The indicator for measuring the centrality of a node is its average distance to other nodes. The closeness centrality algorithm calculates the sum of its distances to other nodes on the basis of calculating the shortest path between all node pairs, and then calculates the inverse of the result.
C(u)=∑v=1n−1d(u,v)1
Where:
u represents a node.
n represents the number of nodes in the graph.
d(u,v) represents the shortest distance between another node v and node u.
It is more common to normalize the calculation result to represent the average length of the shortest path, rather than the sum of the shortest paths. The normalization formula is as follows:
Used to detect the degree of influence of a node on the information flow or resources in the graph, usually used to find nodes that bridge one part of the graph with another.
Input: G = (V, E).
Output: Each node and its betweenness centrality value.
σst represents the sum of the shortest paths between nodes s and t.
σst(v) represents the number of shortest paths between s and t that pass through node v.
In simple terms, betweenness centrality measures the node v by the proportion of the shortest paths between point pairs that pass through node v. Importance.
The following figure shows the steps to calculate the betweenness score.
The calculation process for node D is as follows:
Shortest path node pairs through D
Total number of shortest paths between node pairs p
Percentage of the number of shortest paths through D pp(u)
(A, E)
1
1
(B, E)
1
1
(C, E)
1
1
(B, C)
2 (B->A->C and B->D->C respectively)
0.5
So according to the formula, the betweenness score of node D is: 1 + 1 + 1 + 0.3 = 3.5.
Different from linear sequential programming calculations, the algorithm given here is based on Pregel's message propagation method for calculation. It is only for reference.
Suppose we have three nodes: source node s, intermediate node v and target node t. What we need to calculate is: the proportion of the path passing through node v to all shortest paths from s to t. The σsv×σvt in the formula is the number of shortest paths between s and t that pass through v, which is the combination principle of the shortest path from s to t. The shortest path of t is considered as the combination of the path from s to v and the path from v to t.
This article introduces three different centrality algorithms, each of which identifies important nodes in the graph from different perspectives.
Degree centrality algorithm
The importance of nodes is evaluated by calculating the in-and-out degree of each node. Nodes with high degree centrality mean that they are connected to more nodes.
Closeness centrality
The importance of nodes is measured by calculating the average distance to other nodes. Nodes with high closeness centrality mean a more central position in the graph, and the path from it to all other nodes is shorter, which will be faster if messages are propagated.
Betweenness centrality
The importance of nodes is measured by calculating the probability that the shortest path between other nodes passes through itself. It is usually used to detect the degree of influence of nodes on information flow or resources in the graph. Nodes with high betweenness centrality mean that they are important bridges between two groups of nodes in the graph.