Frame KMeans¶

As of 0.0.4, the API has been loosened to introduce more features.

Creation¶

Grab this class from frmodel.base.D2.kmeans2D import KMeans2D.

KMeans2D would have the signature of

def __init__(self,
             frame: Frame2D,
             model: KMeans,
             fit_indexes,
             frame_1dmask: np.ndarray = None,
             scaler=None):

The frame argument is for a Frame2D.

KMeans Modelling¶

The API allows you to set up your own KMeans model object, hence most parameters such as clusters can be set-up outside frmodel pre-fitting.

Fit Indexes¶

The indexes to fit the kmeans to.

For example

frame_xy = f.get_chns(xy=True, hsv=True)
frame_xy.kmeans(fit_indexes=[1,2,3], ...)

Will use the 2nd, 3rd, 4th channels to perform kmeans on.

That is, the Y-Axis, Hue and Saturation.

Frame 1-D Mask¶

This mask is to remove certain data points from the Frame2D. This is useful if you don’t want to run K-Means on all data.

The mask must be: - an np.ndarray of boolean or truthy values. - in 1-Dimension, hence, call flatten() before passing it as an arugment. - the same size as the Frame2D passed as argument.

Scaler¶

Scales the data before running kmeans, must be a Callable

If None, no scaling is done!

Figure Plotting¶

Plotting is done with Frame2D Plotting.

Score¶

Custom Scoring¶

To test out how well the clustering works, we can mimic supervised learning.

We can have another image (Score Image) that shows the expected grouping of clusters, by simply filling another image with same dimensions with different gray-scales.

We then pair the labelled KMeans and Score gray-scale labels to find out the maximum score attainable.

For example:

[ORIGINAL]
    KMEANS      SCORE    COUNT
[1] LABEL A <-> LABEL A  1000
[2] LABEL A <-> LABEL B  500
[3] LABEL B <-> LABEL A  2000
[4] LABEL B <-> LABEL B  4000

If we wanted the highest score attainable, we look at the top values:

[SORTED BY COUNT]
    KMEANS      SCORE    COUNT
[4] LABEL B <-> LABEL B  4000
[3] LABEL B <-> LABEL A  2000
[1] LABEL A <-> LABEL A  1000
[2] LABEL A <-> LABEL B  500

If we picked [4] here, we cannot pick [3] to attain a maximum score, this is because KMEANS B is connected to SCORE B already, we need to find another.

The only other connection available is A <-> A:

.   KMEANS      SCORE    COUNT
[4] LABEL B <-> LABEL B  4000  [Accept]
[3] LABEL B <-> LABEL A  2000  [Visited KMEANS Label]
[1] LABEL A <-> LABEL A  1000  [Accept]
[2] LABEL A <-> LABEL B  500   [Loop Ended]

Hence, the follow pseudo-code is used:

for kmeans_label, score_label, count in array:
    if kmeans_label or score_label in visited:
        continue
    else:
        visited.append(kmeans_label)
        visited.append(score_label)
        counts.append(count)

Score File¶

A Score File is any image with a deliberate discrete amount of gray-scale values that mark clusters.

Any area which has the same gray-scale value is deemed to be one cluster.

Note that anti-aliasing can cause multiple unnecessary grayscale interpolated values.
Note to save it in a lossless format like png to avoid artifacts.

Example¶

from sklearn.cluster import KMeans
from sklearn.preprocessing import minmax_scale

from frmodel.base import CONSTS
from frmodel.base.D2 import Frame2D
from frmodel.base.D2.kmeans2D import KMeans2D
from tests.base.D2.test_d2 import TestD2

    f = Frame2D.from_image("path/to/file.png")

    C = f.CHN
    frame_xy = f.get_chns(self_=False,
                          chns=[C.MEX_G, C.EX_GR, C.NDI])

    km = KMeans2D(frame_xy,
                  KMeans(n_clusters=3, verbose=False),
                  fit_to=[C.MEX_G, C.EX_GR, C.NDI],
                  scaler=minmax_scale)

    kmf = km.as_frame()
    score = kmf.score(f)

    self.assertAlmostEqual(score['Custom'], 1)
    self.assertAlmostEqual(score['Homogeneity'], 1)
    self.assertAlmostEqual(score['Completeness'], 1)
    self.assertAlmostEqual(score['V Measure'], 1)

Here, we grab MEX_G, EX_GR, NDI to use for KMeans
Then fit using them in fit_to. By default if we don’t specify these channels, all will be used anyways.
We also pre-scale them with minmax-scale. Note it’s passed as a function without brackets
We can convert the clustering as a frame to view its clusters or score it against something else.
Note that because we scored it against itself, it should, ideally, converge to a perfect score.

Custom is the custom scoring algorithm mentioned above.

Module Info¶

class frmodel.base.D2.kmeans2D.KMeans2D(frame: frmodel.base.D2.frame2D.Frame2D, model: sklearn.cluster._kmeans.KMeans, fit_to: Optional[List[frmodel.base.D2.frame2D.Frame2D.CHN]] = None, frame_1dmask: Optional[numpy.ndarray] = None, scaler=None)¶

Bases: object

A KMeans2D object separate from Frame2D as proposed, to avoid cluttering

__init__(frame: frmodel.base.D2.frame2D.Frame2D, model: sklearn.cluster._kmeans.KMeans, fit_to: Optional[List[frmodel.base.D2.frame2D.Frame2D.CHN]] = None, frame_1dmask: Optional[numpy.ndarray] = None, scaler=None)¶

Creates a KMeans Object from current data

Parameters

model – KMeans Model
fit_to – The indexes to .fit() to, must be a list of the Channel Consts. If None, use all channels
scaler – The scaler to use, must be a callable(np.ndarray)
frame_1dmask – The 2D mask to exclude certain points. Must be in 2 Dimensions

Returns

KMeans2D Instance

as_frame() → frmodel.base.D2.frame2D.Frame2D ¶

Converts current model into Frame2D based on labels. Places label at the end of channel dimension

Returns: Frame2D

frame_masked(with_xy: bool = True) → numpy.ndarray¶

Returns the masked frame

Parameters: with_xy – Whether to append XY with it. This can help in determining the pixel locations of these data.
Returns: np.ndarray, a flattened Frame2D