[다변량 분석] 캐글 Mushrooms Data Classification

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

소품집

[다변량 분석] 캐글 Mushrooms Data Classification 본문

Statistics

[다변량 분석] 캐글 Mushrooms Data Classification

sodayeong 2021. 9. 29. 23:37

728x90

setwd('/Users/dayeong/Desktop/21-2/전공/다변량 분석')

# Pakage
library(dplyr)
library(ggplot2)
library(caret)
library(rpart)
library(rpart.plot)
library(randomForest)

# Data Set roding
mushrooms <- read.csv('mushrooms.csv')

for (i in 2:23) { 
  test <- chisq.test(table(mushrooms$class, mushrooms[,i]))
  if (test$p.value < 0.05) {
    print(test)
  }
}

# Target 변수 확인 
ggplot(data=mushrooms, aes(x=class, fill=class)) + 
  geom_bar()+
  labs(title='Mushroom Class Count',subtitle = 'Edible vs Poisonous')

# veil.type 변수는 모두 p(poisonus) -level이 1인 변수로 무의미하다 판단하여 제거. 
mushrooms <- mushrooms[,-17]
mushrooms$class <- factor(mushrooms$class, levels=c('p', 'e'))
summary(mushrooms)

# Train / Test set Split
idx <- sample(1:nrow(mushrooms), nrow(mushrooms)*0.7)
train <- mushrooms[idx, ]
test <- mushrooms[-idx,]

# RandomForest
rf_model <- randomForest(class~., data=train)
pred <- predict(rf_model, newdata=test)
confusionMatrix(pred, test$class)

# Decision Tree

idx <- sample(1:nrow(mushrooms), nrow(mushrooms)*0.7)
train <- mushrooms[idx, ]
test <- mushrooms[-idx,]

tree <- rpart(class~.,data=train)
summary(tree)
pred <- predict(tree, newdata=test, type='class')
confusionMatrix(pred, test$class)

importance(rf_model)
importance(tree)
varImpPlot(rf_model)
varImpPlot(tree)

test$class <- predict(tree, test, type='class')
test$pred <- pred

ggplot(data=test, aes(class, pred)) + 
  geom_jitter(width = 0.2, height = 0.1, size=2)

(HW02)다변량분석-20181478 소다영.pdf

1.39MB

728x90

'Statistics' 카테고리의 다른 글

[다변량분석] 회귀분석 - Prestige Data 잔차 분석, 모델 성능 비교 (0)	2022.12.14
[다변량 분석] 모형 적합성, 회귀계수 유의성 검정, 결정계수(R^2) 해석 (0)	2021.10.13
[다변량 분석] Survey Data를 이용한 다변량분석 (0)	2021.09.29
[다변량 분석] 검정 및 신뢰구간 추정 (1)	2021.09.29
statistics (2)	2020.12.05

'Statistics' Related Articles

Comments

소품집

[다변량 분석] 캐글 Mushrooms Data Classification 본문

[다변량 분석] 캐글 Mushrooms Data Classification

'Statistics' 카테고리의 다른 글

티스토리툴바