Pairwise distance#

Computes the Jaccard proximity over a matrix.

distanceclosure.distance.pairwise_proximity(M, metric='jaccard', *args, **kwargs)[source]#

Calculates pairwise proximity coefficient between rows of a matrix. Three types of Jaccard proximity is available depending on your data.

Parameters
  • M (matrix) – Adjacency matrix

  • metric (str) –

    Jaccard proximity metric. Allowed values:

    • jaccard_binary, jb: binary item-wise comparison.

    • jaccard (scipy.spatial.dist.jaccard): numeric item-wise comparison.

    • jaccard_set, js: set comparison.

    • weighted_jaccard, wj: weighted item-wise comparison.

    Note: Also accepts a custom function being passed.

  • min_support (int (Optional)) – The minimum support passed to the metric function.

  • verbose (bool) – Print every line as it computes.

Returns

M – The matrix of proximities

Return type

matrix

Examples

There are four ways to compute the proximity, here are some examples:

>>> # Numeric Matrix (not necessarily a network)
>>> N = np.array([
    [2,3,4,2],
    [2,3,4,2],
    [2,3,3,2],
    [2,1,3,4]])
>>> # Binary Adjacency Matrix
>>> B = np.array([
    [1,1,1,1],
    [1,1,1,0],
    [1,1,0,0],
    [1,0,0,0]])
>>> # Weighted Adjacency Matrix
>>> W = np.array([
    [4,3,2,1],
    [3,2,1,0],
    [2,1,0,0],
    [1,0,0,0]])

Numeric Jaccard: the default and most commonly used version. Implemented from scipy.spatial.distance.

>>> pairwise_proximity(N, metric='jaccard')
    [[ 1. , 1.  , 0.75, 0.25],
    [ 1.  , 1.  , 0.75, 0.25],
    [ 0.75, 0.75, 1.  , 0.5 ],
    [ 0.25, 0.25, 0.5 , 1.  ]]

Binary Jaccard: the default and most commonly used version.

>>> pairwise_proximity(B, metric='jaccard_binary')
    [[ 1. , 0.75, 0.5 , 0.25],
    [ 0.75, 1.  , 0.66, 0.33],
    [ 0.5 , 0.66, 1.  , 0.5 ],
    [ 0.25, 0.33, 0.5 , 1.  ]]

Set Jaccard: it treats the values in each vector as a set of objects, therefore their order is not taken into account. Note that zeroes are treated as a set item.

>>> pairwise_proximity(B, metric='jaccard_set')
    [[ 1., 0.6 , 0.4 , 0.2 ],
    [ 0.6, 1.  , 0.75, 0.5 ],
    [ 0.4, 0.75, 1.  , 0.67],
    [ 0.2, 0.5 , 0.67, 1.  ]]

Weighted Jaccard: the version for weighted graphs.

>>> pairwise_proximity(W, metric='jaccard_weighted')
    [ 1.,   0.6,  0.3,  0.1],
    [ 0.6,  1.,   0.,   0. ],
    [ 0.3,  0.,   1.,   0. ],
    [ 0.1,  0.,   0.,   1. ],