deniz.n.rezai
deniz.n.rezai
خواندن ۴ دقیقه·۵ سال پیش

Hubs And Authorities

Hyperlink Induced Topic Search (HITS) Algorithm is a Link Analysis Algorithm that rates webpages, developed by Jon Kleinberg. This algorithm is used to the web link-structures to discover and rank the webpages relevant for a particular search.
HITS uses hubs and authorities to define a recursive relationship between webpages. Before understanding the HITS Algorithm, we first need to know about Hubs and Authorities.

  • Given a query to a Search Engine, the set of highly relevant web pages are called Roots. They are potential Authorities.
  • Pages which are not very relevant but point to pages in the Root are called Hubs. Thus, an Authority is a page that many hubs link to whereas a Hub is a page that links to many authorities.

for example, we want to calculate hubs and authorities score for this adjacency matrix by using hits algorithm for k=3.

Matrix A is:

And adjacency matrix with nodes is:

Then, we design a graph with nodes:

After that, showing ranks by using hubs and authorities:

Now, we calculate AT matrix:

In this step, we assume that the initial hub weight vector, u, is 1. So, we calculate

v = AT * u

Next, hub weight vector updated. u = A * v

Now, we create the graph again for k=1:

Then, calculate new authority from k = 1:

v1= 12 + 12 + 22 + 42 = 22

v1 = 1/√22 + 1/√22 + 2/√22 + 4/√22

v1 = 0.213 + 0.213 + 0.426 + 0.853

similarly calculate new hub from k = 1:

u1= 72 + 62 + 52 + 42 = 126

u1= 7/√126 + 6/√126 + 5/√126 + 4/√126

u1 = 0.623 + 0.535 + 0.445 + 0.356

The diagram for k = 2, will be:

Then, calculate new authority from k = 2:

v2= 0.2132 + 0.2132 + 0.4262 + 0.8532 = 0.999

v2 = 0.213/√0.999 + 0.213/√0.999 + 0.426/√0.999 + 0.853/√0.999

v2 = 0.213 + 0.213 + 0.426 + 0.853

similarly calculate new hub from k = 2:

u2= 0.6232 + 0.5352 + 0.4452 + 0.3562 = 0.999

u2= 0.623/√0.999 + 0.535/√0.999 + 0.445/√0.999 + 0.356/√0.999

u2 = 0.623 + 0.535 + 0.445 + 0.356

The diagram for k = 3, will be:

So, k = 2 ≈ k = 3

Hub and authority scores have come to a consistent value.

And when we reached to this statement, it is FINISHED!!!



The library for hits that you can use in python is Networkx. And the library code is:

#!/usr/bin/env python
# Copyright (C) 2008 by
# Aric Hagberg <hagberg@lanl.gov>
# Dan Schult <dschult@colgate.edu>
# Pieter Swart <swart@lanl.gov>
# All rights reserved.
# BSD license.
# NetworkX:http://networkx.lanl.gov/
__author__ = &quot&quot&quotAric Hagberg (hagberg@lanl.gov)&quot&quot&quot
import networkx
from networkx.exception import NetworkXError
__all__ = ['hits','hits_numpy','hits_scipy','authority_matrix','hub_matrix']
def hits(G,max_iter=100,tol=1.0e-8,nstart=None):
&quot&quot&quotReturn HITS hubs and authorities values for nodes.
Parameters
-----------
G : graph
A networkx graph
max_iter : interger, optional
Maximum number of iterations in power method.
tol : float, optional
Error tolerance used to check convergence in power method iteration.
nstart : dictionary, optional
Starting value of each node for power method iteration.
Returns
-------
(hubs,authorities) : two-tuple of dictionaries
Two dictionaries keyed by node containing the hub and authority
values.
Examples
--------
>>> G=nx.path_graph(4)
>>> h,a=nx.hits(G)
Notes
-----
The eigenvector calculation is done by the power iteration method
and has no guarantee of convergence. The iteration will stop
after max_iter iterations or an error tolerance of
number_of_nodes(G)*tol has been reached.
The HITS algorithm was designed for directed graphs but this
algorithm does not check if the input graph is directed and will
execute on undirected graphs.
For an overview see:
A. Langville and C. Meyer, &quotA survey of eigenvector methods of web
information retrieval.&quot http://citeseer.ist.psu.edu/713792.html
&quot&quot&quot
import networkx
if type(G) == networkx.MultiGraph or type(G) == networkx.MultiDiGraph:
raise Exception(&quothits() not defined for graphs with multiedges.&quot)
# if not G.weighted:
# raise Exception(&quothits(): input graph must be weighted&quot)
# choose fixed starting vector if not given
if nstart is None:
h=dict.fromkeys(G,1.0/G.number_of_nodes())
else:
h=nstart
# normalize starting vector
s=1.0/sum(h.values())
for k in x: h[k]*=s
nnodes=G.number_of_nodes()
# power iteration: make up to max_iter iterations
for i in range(max_iter):
hlast=h
h=dict.fromkeys(hlast.keys(),0)
a=dict.fromkeys(hlast.keys(),0)
# this &quotmatrix multiply&quot looks odd because it is
# doing a left multiply a^T=hlast^T*G
for n in h:
for nbr in G[n]:
a[nbr]+=hlast[n]*G[n][nbr].get('weight',1)
# now multiply h=Ga
for n in h:
for nbr in G[n]:
h[n]+=a[nbr]*G[n][nbr].get('weight',1)
# normalize vector
s=1.0/sum(h.values())
for n in h: h[n]*=s
# normalize vector
s=1.0/sum(a.values())
for n in a: a[n]*=s
# check convergence, l1 norm
err=sum([abs(h[n]-hlast[n]) for n in h])
if err < tol:
return h,a
raise NetworkXError(&quotHITS: power iteration failed to converge in %d iterations.&quot%(i+1))
def authority_matrix(G,nodelist=None):
&quot&quot&quotReturn the HITS authority matrix.&quot&quot&quot
import networkx
M=networkx.to_numpy_matrix(G,nodelist=nodelist)
return M.T*M
def hub_matrix(G,nodelist=None):
&quot&quot&quotReturn the HITS hub matrix.&quot&quot&quot
import networkx
M=networkx.to_numpy_matrix(G,nodelist=nodelist)
return M*M.T
def hits_numpy(G,max_iter=100,tol=1.0e-6,nodelist=None):
&quot&quot&quotReturn a NumPy array of the hubs and authorities.&quot&quot&quot
import numpy
import networkx
M=networkx.to_numpy_matrix(G,nodelist=nodelist)
(n,m)=M.shape # should be square
A=M.T*M # authority matrix
x=numpy.ones((n,1))/n
# power iteration on authority matrix
for i in range(max_iter):
xlast=x
x=numpy.dot(A,x)
x=x/x.sum()
# check convergence, l1 norm
err=numpy.abs(x-xlast).sum()
if err < n*tol:
a=numpy.asarray(x).flatten()
# h=M*a
h=numpy.asarray(numpy.dot(M,a)).flatten()
h/=h.sum()
return h,a
raise NetworkXError(&quotpage_rank: power iteration failed to converge in %d iterations.&quot%(i+1))
def hits_scipy(G,max_iter=100,tol=1.0e-6,nodelist=None):
&quot&quot&quotReturn a SciPy sparse array of the hubs and authorities.&quot&quot&quot
import numpy
import scipy.sparse
import networkx
M=networkx.to_scipy_sparse_matrix(G,nodelist=nodelist)
(n,m)=M.shape # should be square
A=M.T*M # authority matrix
x=scipy.ones((n,1))/n # initial guess
# power iteration on authority matrix
for i in range(max_iter):
xlast=x
x=A*x
x=x/x.sum()
# check convergence, l1 norm
err=scipy.absolute(x-xlast).sum()
if err < n*tol:
a=numpy.asarray(x).flatten()
# h=M*a
h=numpy.asarray(M*a).flatten()
h/=h.sum()
return h,a
raise NetworkXError(&quotpage_rank: power iteration failed to converge in %d iterations.&quot%(i+1))


شاید از این پست‌ها خوشتان بیاید