Frame KMeans¶
As of 0.0.4, the API has been loosened to introduce more features.
Creation¶
Grab this class from frmodel.base.D2.kmeans2D import KMeans2D.
KMeans2D would have the signature of
def __init__(self,
frame: Frame2D,
model: KMeans,
fit_indexes,
frame_1dmask: np.ndarray = None,
scaler=None):
The frame argument is for a Frame2D.
KMeans Modelling¶
The API allows you to set up your own KMeans model object, hence most
parameters such as clusters can be set-up outside frmodel pre-fitting.
Fit Indexes¶
The indexes to fit the kmeans to.
For example
frame_xy = f.get_chns(xy=True, hsv=True)
frame_xy.kmeans(fit_indexes=[1,2,3], ...)
Will use the 2nd, 3rd, 4th channels to perform kmeans on.
That is, the Y-Axis, Hue and Saturation.
Frame 1-D Mask¶
This mask is to remove certain data points from the Frame2D. This is useful if you don’t want to run K-Means on all
data.
The mask must be:
- an np.ndarray of boolean or truthy values.
- in 1-Dimension, hence, call flatten() before passing it as an arugment.
- the same size as the Frame2D passed as argument.
Figure Plotting¶
Plotting is done with Frame2D Plotting.
Score¶
Custom Scoring¶
To test out how well the clustering works, we can mimic supervised learning.
We can have another image (Score Image) that shows the expected grouping of clusters, by simply filling another image with same dimensions with different gray-scales.
We then pair the labelled KMeans and Score gray-scale labels to find out the maximum score attainable.
For example:
[ORIGINAL]
KMEANS SCORE COUNT
[1] LABEL A <-> LABEL A 1000
[2] LABEL A <-> LABEL B 500
[3] LABEL B <-> LABEL A 2000
[4] LABEL B <-> LABEL B 4000
If we wanted the highest score attainable, we look at the top values:
[SORTED BY COUNT]
KMEANS SCORE COUNT
[4] LABEL B <-> LABEL B 4000
[3] LABEL B <-> LABEL A 2000
[1] LABEL A <-> LABEL A 1000
[2] LABEL A <-> LABEL B 500
If we picked [4] here, we cannot pick [3] to attain a maximum score,
this is because KMEANS B is connected to SCORE B already, we need to find another.
The only other connection available is A <-> A:
. KMEANS SCORE COUNT
[4] LABEL B <-> LABEL B 4000 [Accept]
[3] LABEL B <-> LABEL A 2000 [Visited KMEANS Label]
[1] LABEL A <-> LABEL A 1000 [Accept]
[2] LABEL A <-> LABEL B 500 [Loop Ended]
Hence, the follow pseudo-code is used:
for kmeans_label, score_label, count in array:
if kmeans_label or score_label in visited:
continue
else:
visited.append(kmeans_label)
visited.append(score_label)
counts.append(count)
Score File¶
A Score File is any image with a deliberate discrete amount of gray-scale values that mark clusters.
Any area which has the same gray-scale value is deemed to be one cluster.
Note that anti-aliasing can cause multiple unnecessary grayscale interpolated values.
Note to save it in a lossless format like
pngto avoid artifacts.
Example¶
from sklearn.cluster import KMeans
from sklearn.preprocessing import minmax_scale
from frmodel.base import CONSTS
from frmodel.base.D2 import Frame2D
from frmodel.base.D2.kmeans2D import KMeans2D
from tests.base.D2.test_d2 import TestD2
f = Frame2D.from_image("path/to/file.png")
C = f.CHN
frame_xy = f.get_chns(self_=False,
chns=[C.MEX_G, C.EX_GR, C.NDI])
km = KMeans2D(frame_xy,
KMeans(n_clusters=3, verbose=False),
fit_to=[C.MEX_G, C.EX_GR, C.NDI],
scaler=minmax_scale)
kmf = km.as_frame()
score = kmf.score(f)
self.assertAlmostEqual(score['Custom'], 1)
self.assertAlmostEqual(score['Homogeneity'], 1)
self.assertAlmostEqual(score['Completeness'], 1)
self.assertAlmostEqual(score['V Measure'], 1)
Here, we grab MEX_G, EX_GR, NDI to use for KMeans
Then fit using them in
fit_to. By default if we don’t specify these channels, all will be used anyways.We also pre-scale them with
minmax-scale. Note it’s passed as a function without bracketsWe can convert the clustering as a frame to view its clusters or
scoreit against something else.Note that because we scored it against itself, it should, ideally, converge to a perfect score.
Custom is the custom scoring algorithm mentioned above.
Module Info¶
-
class
frmodel.base.D2.kmeans2D.KMeans2D(frame: frmodel.base.D2.frame2D.Frame2D, model: sklearn.cluster._kmeans.KMeans, fit_to: Optional[List[frmodel.base.D2.frame2D.Frame2D.CHN]] = None, frame_1dmask: Optional[numpy.ndarray] = None, scaler=None)¶ Bases:
objectA KMeans2D object separate from Frame2D as proposed, to avoid cluttering
-
__init__(frame: frmodel.base.D2.frame2D.Frame2D, model: sklearn.cluster._kmeans.KMeans, fit_to: Optional[List[frmodel.base.D2.frame2D.Frame2D.CHN]] = None, frame_1dmask: Optional[numpy.ndarray] = None, scaler=None)¶ Creates a KMeans Object from current data
- Parameters
model – KMeans Model
fit_to – The indexes to .fit() to, must be a list of the Channel Consts. If None, use all channels
scaler – The scaler to use, must be a callable(np.ndarray)
frame_1dmask – The 2D mask to exclude certain points. Must be in 2 Dimensions
- Returns
KMeans2D Instance
-
as_frame() → frmodel.base.D2.frame2D.Frame2D¶ Converts current model into Frame2D based on labels. Places label at the end of channel dimension
- Returns
Frame2D
-
frame_masked(with_xy: bool = True) → numpy.ndarray¶ Returns the masked frame
- Parameters
with_xy – Whether to append XY with it. This can help in determining the pixel locations of these data.
- Returns
np.ndarray, a flattened Frame2D
-