[ML] Semi-Supervised Learning (label

머신러닝에서는 크게 두가지로 나뉘는데

1. 지도학습(supervised learning)

2. 비지도학습(unsupervised learning)

지도학습중에서 추가로 나누자면 준지도학습(semi-supervised learning)이란 기법이있다.

이 기법은 우리가 흔히 데이터를 다룰때 일부한테만 정답지가 있고 일부한테는 정답지가 없을때 사용하는것인데

예를 들어서 다음 그림을 보자

위에 라벨링이 되어있는 데이터를 볼때 점선처럼 두개 부류로 나눌수가있다. 하지만 데이터가 적고, 단순한 모양으로인해 실제 데이터에서는 제대로 분류(작동)를 못할수가있다.

이때 추가로 라벨링이 되어있지않은 데이터를 넣을때 밑에그림과 같이 데이터 분포도를 띄우게되고. 여기서 semi-Supervised Learning을 하게되면 밑그림의 점선과 같이 분류하게 된다.

이러한 학습을 semi-supervised learning이라고 말한다.

나또한 데이터를 다룰때 라벨링이 안되어있는 경우가 많은데 해당 기법을 사용하면 라벨링을 편하게 나눌수있을꺼같다.

python 코드는 다음과 같다.

sklearn 라이브러리에 semi_supervised 클래스의 label_propagation함수를 사용하였다.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
from sklearn.semi_supervised import label_propagation

rng = np.random.RandomState(0)

iris = datasets.load_iris()

X = iris.data[:, :2]
y = iris.target

# step size in the mesh
h = .02

y_30 = np.copy(y)
y_30[rng.rand(len(y)) < 0.3] = -1
y_50 = np.copy(y)
y_50[rng.rand(len(y)) < 0.5] = -1
# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
ls30 = (label_propagation.LabelSpreading().fit(X, y_30),
        y_30)
ls50 = (label_propagation.LabelSpreading().fit(X, y_50),
        y_50)
ls100 = (label_propagation.LabelSpreading().fit(X, y), y)
rbf_svc = (svm.SVC(kernel='rbf', gamma=.5).fit(X, y), y)

# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

# title for the plots
titles = ['Label Spreading 30% data',
          'Label Spreading 50% data',
          'Label Spreading 100% data',
          'SVC with rbf kernel']

color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}

for i, (clf, y_train) in enumerate((ls30, ls50, ls100, rbf_svc)):
    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    plt.subplot(2, 2, i + 1)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
    plt.axis('off')

    # Plot also the training points
    colors = [color_map[y] for y in y_train]
    plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')

    plt.title(titles[i])

plt.suptitle("Unlabeled points are colored white", y=0.1)
plt.show()

'Data > Data Science' 카테고리의 다른 글

[Pytorch] LSTM 간단한 공부 (1)	2020.01.04
[Pyspark] pyspark 내장 ML 모델사용 (0)	2020.01.02
[LSTM] 단계별 수식 정리 (0)	2019.09.23
[Pandas] 4분위수? pandas.Dataframe.describe함수 (0)	2019.08.27
[인터뷰] 머신러닝 인터뷰 질문 모음(영어) (0)	2019.07.11

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

개발 공부방

[ML] Semi-Supervised Learning (label_propagation)

'Data > Data Science' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

[ML] Semi-Supervised Learning (label_propagation)

'Data > Data Science' 카테고리의 다른 글

'Data/Data Science' 관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역