## Topic Specific PageRank simulation in F#

The standard PageRank can be simulated by a random walker (or many of them) surfing the graph of web pages. The random walker is following a random link on the current page with equal probability for each of these links. The PageRank of each page is then proportional to the visits of the walker to the specific page. To avoid dead ends and other issues, the random walker is allowed to teleport to any node in the graph with a small teleport probability at each step.

In Topic Specific PageRank (also know as Personalised Page Rank), the random walker is biased by teleporting to a set of topic-specific relevant nodes. The topic is determined by the context of the search query.

The goal of this post is not to explain PageRank in more detail, but to get this simple model and code it in F# to get actual numbers for a small network of pages. So, given a network of pages, and a subset of relevant pages, how do we compute the PageRank of all pages in the graph?

I start with a couple of helper functions, one that gives an element of a list in random with equal probability, and the second one that uses a function to assign weights/probabilities to each item.

open System

let randomChoice (rnd : Random) (items : 'a list) =
items.[rnd.Next(List.length items)]

let randomWeightedChoice (rnd : Random) weightSelector (items : 'a list) =

let weights = items |> List.map weightSelector
let weightSum = weights |> List.sum

let rec firstWithAccumulationGreaterThan value items =
let rec loop value items =
match items with
| [] -> raise (ArgumentException("The list of items should never be empty."))
| (x,w)::tl ->
if w > value
then x
else loop (value - w) tl
loop value items

let random = rnd.NextDouble() * weightSum

firstWithAccumulationGreaterThan random (List.zip items weights)


I am going to use the first one when I choose the link to follow on a page. The second one will be used when I choose what page to teleport to, as I will attach a weight to each page in the teleport set. I define two record types, one for a link and one for a teleport node with its attached weight. I identify each node with an integer Id.

    type Link = { From : int; To : int}

type TeleportNode = { Id : int; Weight : float }


Next I define a record type that contains the state of the random walker, i.e. the current node and the number of visits for each page visited up to now.

type WalkerState =
{ CurrentNode : int;
// NodeId, Number of times visited
Visits : Map<int, float> }
member this.visitsTo nodeId =
match Map.tryFind nodeId this.Visits with
| None -> 0.0
| Some(v) -> v
{ this with Visits = this.Visits |> Map.add nodeId (this.visitsTo nodeId + 1.0)}
member this.visit nodeId =
{ this with CurrentNode = nodeId }.addVisitTo nodeId
member this.getPageRankEstimate () =
let totalVisits =
this.Visits |> Map.toSeq |> Seq.map (fun (k, v) -> v) |> Seq.sum |> float
this.Visits |> Map.map (fun k v -> v / totalVisits)


I have defined a few functions on the WalkerState. As records are immutable, operations that change the state of the random walker, like visiting a new page, return a new WalkerState object.

Now it is time to define the algorithm that walks the graph. I provide a list of links, the set of teleport nodes, the teleport probability (between 0 and 1), the number of steps to take before reporting the result, and finally I also pass in/inject the random number generator as I might want to have more control on it (like control its seed).

let walkGraphWithTeleportations (rnd : Random) (links : Link list) (teleports : TeleportNode list)
(teleportProbability : float) (steps : int) =

// Discover all the nodes from the list of links.
let allNodes = Seq.append (links |> Seq.map (fun x -> x.From))
(links |> Seq.map (fun x -> x.To))
|> Set.ofSeq
|> List.ofSeq
let nodeCount = allNodes |> List.length |> float

let teleportNodes = teleports |> List.map (fun x -> x.Id)
// Store the weights in a map so that you can look them up efficiently.
let teleportWeights = teleports |> List.map (fun x -> (x.Id, x.Weight)) |> Map.ofList

// A Map of all linked pages from a given page.
|> Seq.map (fun (from, tos) ->
(from, tos
|> Seq.map (fun x -> x.To)
|> Seq.toList))
|> Map.ofSeq

let randomTeleportNode rnd =
randomWeightedChoice rnd (fun i -> teleportWeights.[i]) teleportNodes

let takeSingleStep (rnd : Random) (walker : WalkerState) =
let neighbours =
| None -> []
| Some(ns) -> ns
let nextNode =
if (List.isEmpty neighbours) || rnd.NextDouble() < teleportProbability
then
randomTeleportNode rnd
else
randomChoice rnd neighbours
walker.visit nextNode

let rec loop (times : int) (rnd : Random) (walker : WalkerState) =
if times <= 0 then walker
else
let newWalker = takeSingleStep rnd walker
loop (times - 1) rnd newWalker

let startingNode = randomTeleportNode rnd
let initialWalkerState = { CurrentNode = startingNode; Visits = Map.empty }

let finalWalkerState = loop steps rnd initialWalkerState

allNodes
|> List.fold (fun acc node ->
// Add 0.0 PageRank for pages not visited but they are part of the graph.
if acc |> Map.containsKey node then acc else acc |> Map.add node 0.0)
(finalWalkerState.getPageRankEstimate ())


The stopping condition here is just reaching the specified number of steps, and it should be good enough assuming I use a number big enough to reach the stable solution.

Note that, to do this properly, I would perform the random walker steps in iterations. At each iteration I would measure how different the solution is. When this difference drops below some value epsilon I would stop. For measuring the difference I would need some measure of distance between two solutions, which could simply be the sum over nodes of the square of the difference in PageRank (between the values in each solution).

As an example I are going to use the following graph

The links in these graph are:

let links =
[ { From = 1; To = 3 }
{ From = 1; To = 4 }
{ From = 2; To = 5 }
{ From = 2; To = 6 }
{ From = 3; To = 7 }
{ From = 4; To = 7 }
{ From = 4; To = 8 }
{ From = 5; To = 9 }
{ From = 6; To = 9 }
{ From = 6; To = 10 }
{ From = 8; To = 9 }
{ From = 8; To = 11 }
{ From = 9; To = 8 }
{ From = 9; To = 11 }
{ From = 11; To = 7 }
{ From = 11; To = 10 } ]


First I use a single page {1} as the set of teleport nodes

let teleportNodes = [ { Id = 1; Weight = 1.0 } ]


and I run the simulation for a million steps with teleport probability 0.25

let rnd = new Random(7919)
let teleportProbability = 0.25
let steps = 1000000
let result = walkGraphWithTeleportations rnd links teleportNodes teleportProbability steps

result
|> Map.toSeq
|> Seq.iter (fun (node, pr) -> printfn "%d -> %f" node pr)


The output I get is

1 -> 0.392301
2 -> 0.000000
3 -> 0.146919
4 -> 0.147492
5 -> 0.000000
6 -> 0.000000
7 -> 0.178075
8 -> 0.064714
9 -> 0.024405
10 -> 0.012689
11 -> 0.033405


And depicting the PageRank with colours I get

where PageRank from 0 to 1 corresponds to colours from dark blue (for cold) to dark red (for hot). Pages that are closest to the page 1 become relevant and get higher PageRank than those further away, no matter how linked the latter are.

For comparison, if I add all the nodes to the teleport set I are going to get the standard PageRank algorithm. Running the same code with the modification

let teleportNodes =
[1 .. 11]
|> List.map (fun i -> { Id = i; Weight = 1.0 })


I get the output

1 -> 0.041582
2 -> 0.041389
3 -> 0.057081
4 -> 0.057222
5 -> 0.057457
6 -> 0.056575
7 -> 0.160136
8 -> 0.118842
9 -> 0.150491
10 -> 0.116490
11 -> 0.142735


which looks like this

where most linked pages get the highest PageRank.

Next I set the teleport nodes to {2}

let teleportNodes = [ { Id = 2; Weight = 1.0 } ]


and the output I get is

1 -> 0.000000
2 -> 0.332265
3 -> 0.000000
4 -> 0.000000
5 -> 0.124476
6 -> 0.124816
7 -> 0.031238
8 -> 0.060944
9 -> 0.163396
10 -> 0.078754
11 -> 0.084111


which corresponds to the graph

Now, I use {1,8} as the teleport set and assign weight to 8 three times bigger than the one of 8. The weight in the two previous examples was irrelevant because I had a single node, but this time it makes a difference.

let teleportNodes = [ { Id = 1; Weight = 1.0 }
{ Id = 8; Weight = 3.0 } ]


The output we get is

1 -> 0.095392
2 -> 0.000000
3 -> 0.035732
4 -> 0.035732
5 -> 0.000000
6 -> 0.000000
7 -> 0.107830
8 -> 0.347664
9 -> 0.130148
10 -> 0.067629
11 -> 0.179873


and I see that pages next to node 8 get higher PageRank than those next to 1.

Finally, I swap node 1 with 2 in the teleport set

let teleportNodes = [ { Id = 2; Weight = 1.0 }
{ Id = 8; Weight = 3.0 } ]


and the output I get is

1 -> 0.000000
2 -> 0.091306
3 -> 0.000000
4 -> 0.000000
5 -> 0.034185
6 -> 0.034412
7 -> 0.070448
8 -> 0.335009
9 -> 0.163838
10 -> 0.083237
11 -> 0.187565


which looks like this