Machine Learning in Action ch.7.3 AdaBoost

의사결정 스텀프는 간단한 의사결정 트리이다. 이것은 하나의 분할을 가지는 트리이며 이것이 곧 하나의 스텀프가 된다.

데이터 집합을 가지고 의사결정 스텀프를 구축하기 위한 2가지 함수를 작성한다.

1번째, 어떤 값이 테스트를 위한 임계 값보다 작거나 큰 경우, 이값은 검사에 사용

2번째, 관련이 더 많은 함수는 데이터 집합에 가중치가 부여된 상태로 반복하고 가장 낮은 오류를 산출하는 스텀프를 찾음

def stumpClassify(dataMatrix,dimen,threshVal,threshIneq):#just classify the data

retArray = ones((shape(dataMatrix)[0],1))

if threshIneq == 'lt':

retArray[dataMatrix[:,dimen] <= threshVal] = -1.0

else:

retArray[dataMatrix[:,dimen] > threshVal] = -1.0

return retArray

def buildStump(dataArr,classLabels,D):

dataMatrix = mat(dataArr); labelMat = mat(classLabels).T

m,n = shape(dataMatrix)

numSteps = 10.0; bestStump = {}; bestClasEst = mat(zeros((m,1)))

minError = inf #init error sum, to +infinity

for i in range(n):#loop over all dimensions

rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max();

stepSize = (rangeMax-rangeMin)/numSteps

for j in range(-1,int(numSteps)+1):#loop over all range in current dimension

for inequal in ['lt', 'gt']: #go over less than and greater than

threshVal = (rangeMin + float(j) * stepSize)

predictedVals = stumpClassify(dataMatrix,i,threshVal,inequal)#call stump classify with i, j, lessThan

errArr = mat(ones((m,1)))

errArr[predictedVals == labelMat] = 0

weightedError = D.T*errArr #calc total error multiplied by D

#print "split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshVal, inequal, weightedError)

if weightedError < minError:

minError = weightedError

bestClasEst = predictedVals.copy()

bestStump['dim'] = i

bestStump['thresh'] = threshVal

bestStump['ineq'] = inequal

return bestStump,minError,bestClasEst

캡쳐하기 위해 먼저 써야할 코드를 나중에 썼지만 위 그림 순서대로 작성하면 위와 같은 결과를 얻는다.

이러한 의사결정 스텀프 생성기는 의사결정 트리를 간소화하여 만든 것이다. 우리는 이것을 약한 학습기라고 부른다. 이것이 곧 약한 분류 알고리즘을 의미한다. 다음 절에서는 다양한 야간 학습기들을 사용하기 위한 AdaBoost 코드를 생성하게 된다.

Computer & Books