๊ด€๋ฆฌ ๋ฉ”๋‰ด

์†Œํ’ˆ์ง‘

[ML] MIMIC-II ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํŒจํ˜ˆ์ฆ ํ™˜์ž ๋ถ„๋ฅ˜ ๋ณธ๋ฌธ

AI

[ML] MIMIC-II ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํŒจํ˜ˆ์ฆ ํ™˜์ž ๋ถ„๋ฅ˜

sodayeong 2022. 9. 17. 00:17
728x90

๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป ๋ฐ์ดํ„ฐ ์†Œ๊ฐœ

MIMIC-II ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค๋ž€?

  • 2001๋…„~2008๋…„ ์‚ฌ์ด์— ์ค‘ํ™˜์ž์‹ค(ICU)์—์„œ ์ˆ˜์ง‘๋œ ํ™˜์ž ์ƒ์ฒด ์‹ ํ˜ธ ๋ฐ์ดํ„ฐ
  • ํ™˜์ž์ฐจํŠธ๋ฐ์ดํ„ฐ(chartevents) ํŒŒ์ผ์—๋Š” ํ™˜์ž์— ๋Œ€ํ•œ ๋ชจ๋“  ์ฐจํŠธํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จ
    • ์œ„ ๋ฐ์ดํ„ฐ์— ์žˆ๋Š” Primary Key๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค์— ์ ‘๊ทผ ํ•  ์ˆ˜ ์žˆ์Œ
  • ์ƒ์ฒด ์‹ ํ˜ธ ๋ฐ์ดํ„ฐ์—๋Š” Heart Rate, Respiratory Rate, Blood Pressure, Boby Temperature ๋“ฑ์˜ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Œ
  • ๊ธฐ์กด ์—ฐ๊ตฌ ์ž๋ฃŒ์—์„œ๋„ ์œ„ ํ™˜์ž ์ฐจํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ค‘์ ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ ํ™œ๋ ฅ ์ง•ํ›„ ์ฝ”๋“œ๋ฅผ ์ถ”์ถœ ํ–ˆ์œผ๋ฏ€๋กœ, ๋ถ„์„๊ณผ ๋”ฅ๋Ÿฌ๋‹ ์˜ˆ์ธก์— ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ๋กœ ํŒ๋‹จํ•จ

๐Ÿ‘จ๐Ÿป‍โš•๏ธ ํŒจํ˜ˆ์ฆ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ Feature ์„ ์ •


๊ทธ๋ฆผ1. ํŒจํ˜ˆ์ฆ ์ง„๋‹จ์— ํ•„์š”ํ•œ ์ƒ์ฒด์‹ ํ˜ธ ๋ฐ์ดํ„ฐ

  • ํŒจํ˜ˆ์ฆ์€ ํŠน์ด์ ์ธ ์ง„๋‹จ๋ฒ•์ด ์žˆ์ง€ ์•Š์Œ
  • ํ†ต์ƒ์ ์œผ๋กœ ํŒจํ˜ˆ์ฆ์˜ ์›์ธ์ด ๋˜๋Š” ์š”์†Œ๋ฅผ feature๋กœ ์„ ์ •

 

๐Ÿ—‚ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •

๊ทธ๋ฆผ2. MIMIC-II ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ด€๊ณ„๋„

  • ๐Ÿ”‘ subject_id(Primary Key) ๊ธฐ์ค€์œผ๋กœ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค๊ฐ€ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Œ

 

๊ทธ๋ฆผ3. SUBJECT_ID ๊ธฐ์ค€์œผ๋กœ ์กฐ์ธํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”

 

 

๊ทธ๋ฆผ4. ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•œ CHARTEVENTS

 

  • chartevents ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค์—์„œ ํŒจํ˜ˆ์ฆ ํ™˜์ž ๋ถ„๋ฅ˜์— ํ•„์š”ํ•œ feature๋งŒ ์ถ”์ถœํ•จ
    • itemid๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ฐพ๊ณ ์ž ํ•˜๋Š” value ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ
    • itemid์—์„œ ํ•˜๋‚˜์˜ ์ฆ์ƒ์„ ๋‹ค์–‘ํ•œ ์ฝ”๋“œ๊ฐ€ ์กด์žฌ
    • ๋™์ผํ•œ ์งˆ๋ณ‘์˜ N๊ฐœ์˜ itemid๊ฐ€ ์žˆ์œผ๋ฉด ๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉ(๋Œ€๋ถ€๋ถ„ ํ•œ ์ฝ”๋“œ์— ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชฐ๋ ค์žˆ์Œ)
    • ๋•Œ๋ฌธ์— ํ™˜์ž์˜ ์งˆ๋ณ‘์ฐจํŠธ๋ฅผ ์‹œ๊ฐ„์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ ๋งŽ์€ 1๊ฐœ์˜ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•จ.

 

 

๊ทธ๋ฆผ5. ICD9 ํ…Œ์ด๋ธ”์— DESCRIPTION์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํŒจํ˜ˆ์ฆ ์œ ๋ฌด ํด๋ž˜์Šค ์ƒ์„ฑ

  • DESCRIPTION์€ ์ „๋ฌธ์˜๊ฐ€ ํ™˜์ž๋ฅผ ์ง„๋‹จํ•œ ๋ฐ์ดํ„ฐ
  • ์˜ํ•™์šฉ์–ด๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ํŒจํ˜ˆ์ฆ ‘SEPTICEMIA’ ๋‹จ์–ด ํฌํ•จ์—ฌ๋ถ€๋กœ ํด๋ž˜์Šค ์ƒ์„ฑ
icd9$Class=ifelse(str_detect(icd9$description,'SEPTICEMIA')==TRUE, 1, 0) 

 

 

 

๊ทธ๋ฆผ6. ์ž…์›์‹œ๊ฐ„์„ ๊ธฐ์ค€์œผ๋กœ ํ™˜์ž ์ •๋ณด๋ฅผ ์ •๋ ฌํ•œ ํ…Œ์ด๋ธ”

  • ์ž…์› ์‹œ๊ฐ„์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•˜๊ธฐ ์œ„ํ•ด ICD9์™€ ADMISSION ํ…Œ์ด๋ธ”์„ ๋‘ ๊ฐœ์˜ ํ‚ค๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์กฐ์ธ
data_1 = merge(icd9, admissions, by=c('subject_id', 'hadm_id'))
data_2 = data_1 %>% select('subject_id', 'hadm_id', 'code', 'Class', 'admit_dt')  
time_1 = data_2 %>% select('subject_id', 'admit_dt')

 

 

 

 

๊ทธ๋ฆผ7. ์ตœ์ข… ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค

  • ์ตœ์ข… ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ํ™˜์ž๋ณ„ ์ง„๋ฃŒ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌ
  • ํŒจํ˜ˆ์ฆ ์ง„๋‹จ์— ํ•„์š”ํ•œ ์ตœ์ข… ์ƒ์ฒด์‹ ํ˜ธ ๋ฐ์ดํ„ฐ ๋ฒ ์ด์Šค๋ฅผ ๋งŒ๋“ฆ
  • ๊ฐ feature๋ฅผ ์ •์ƒ ์ˆ˜์น˜ ๋ฒ”์œ„๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์ด์ƒ์น˜ ์ œ๊ฑฐ

 

๐Ÿ“Œ ๋ชจ๋ธ ๊ตฌ์ถ•

๊ทธ๋ฆผ8.  ์ตœ์ข… ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์„ฑ๋ณ„์„ ๊ธฐ์ค€์œผ๋กœ 2๊ฐœ ํ…Œ์ด๋ธ”๋กœ ๋ถ„๋ฆฌ

 

ML

๊ทธ๋ฆผ9. ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ(NN, DT) ์ •ํ™•๋„ ๋น„๊ต

  • Train๊ณผ Test set์„ 7:3 ๋น„์œจ๋กœ ๋‚˜๋ˆ„๊ณ  NN ๋ชจ๋ธ๊ณผ DT ๋ชจ๋ธ์„ ์ ์šฉ
    • ๋‚จ์„ฑ โžก๏ธ accuracy 0.7284365
    • ์—ฌ์„ฑ โžก๏ธ accuracy 0.7553052
    • ๋”ฐ๋ผ์„œ ํ™˜์ž๋ณ„ ์ง„๋ฃŒ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌ๋œ ์ƒ์ฒด ์‹ ํ˜ธ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ํ•™์Šตํ•˜๊ณ , ํŒจํ˜ˆ์ฆ์„ ํ™•์ง„ ๋ฐ›์€ Class๋กœ ๋ถ„๋ฅ˜ํ•œ ๊ฒฐ๊ณผ ๋‘ ํ…Œ์ด๋ธ” ๋ชจ๋‘ 70% ์ด์ƒ์œผ๋กœ ๊ดœ์ฐฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Œ.
result_F$Class <- as.factor(result_F$Class)
result_F$hospital_expire_flg <- as.factor(result_F$hospital_expire_flg)

set.seed(1234)
model_F <- result_F[1:nrow(result_F), ]

trainData_F <- model_F[1:(nrow(model_F)*0.7),]
testData_F <- model_F[((nrow(model_F)*0.7)+1):nrow(model_F),]
nrow(trainData_F)
nrow(testData_F)
table(trainData_F$Class)

# NN model 
nn.restult_F <- nnet(Class~., data=trainData_F, size=2, rang=0.1, decay=5e-4,maxit=200)

# Decision Tree
dt.restult_F <- rpart(Class~., data=trainData_F)

DL

๊ทธ๋ฆผ10-1. ๋”ฅ๋Ÿฌ๋‹ DNN ๋ชจ๋ธ ์ •ํ™•๋„

๊ทธ๋ฆผ10-2. ๋”ฅ๋Ÿฌ๋‹ DNN ๋ชจ๋ธ ์ •ํ™•๋„

  • ML ๋ชจ๋ธ๊ณผ ๋™์ผํ•˜๊ฒŒ Train๊ณผ Test set์„ 7:3 ๋น„์œจ๋กœ ๋‚˜๋ˆ„๊ณ , DNN ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ ์šฉ
    • ์—ฌ์„ฑepochs = 100, batch_size = 5 โžก๏ธ accuracy 0.7335
    • ์—ฌ์„ฑ epochs = 200, batch_size = 3 โžก๏ธ accuracy 0.7327
    • ๋‚จ์„ฑ epochs = 200, batch_size = 3 โžก๏ธ accuracy 0.6944
model %>% fit(  
            dnn_train_x, 
            dnn_train_y,   
            epochs = 100,   
            batch_size = 5,  
            validation_split = 0.2)
model %>% fit(  
            dnn_train_x, 
            dnn_train_y,   
            epochs = 200,   
            batch_size = 3,  
            validation_split = 0.2)

 

๐Ÿ” ๊ฒฐ๋ก 

ML ๋ชจ๋ธ

  • ๋‚จ์„ฑ โžก๏ธ accuracy 0.7284365
  • ์—ฌ์„ฑ โžก๏ธ accuracy 0.7553052

DL ๋ชจ๋ธ

  • ์—ฌ์„ฑ epochs = 100, batch_size = 5 โžก๏ธ accuracy 0.7335
  • ์—ฌ์„ฑ epochs = 200, batch_size = 3 โžก๏ธ accuracy 0.7327
  • ๋‚จ์„ฑ epochs = 200, batch_size = 5 โžก๏ธ accuracy 0.6944
  • ML ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด DL ๋ชจ๋ธ ์„ฑ๋Šฅ๋ณด๋‹ค ๋†’์Œ
  • DNN ๋ชจ๋ธ ํŠน์„ฑ์ƒ feature์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์„ ์ˆ˜๋ก hidden layer์˜ ์ธต์ด ๊นŠ์–ด์ ธ ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง€๋Š” ํŠน์„ฑ์„ ๊ณ ๋ คํ•ด ์ถ”ํ›„ feature ๊ฐœ์ˆ˜๋ฅผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ถ”์ถœํ•œ๋‹ค๋ฉด ์ •ํ™•๋„ ์ƒ์Šน์ด ๋†’์•„์งˆ๊ฑฐ๋ผ ์˜ˆ์ƒ๋จ.

 

CODE

https://www.notion.so/dayeong1021/code-976f47c41c42431bad95f7ddffd82646

728x90
Comments