Summary
散布図にGLMなどの予測値を加える方法の備忘録。plot関数で図示したのち、abline関数やlines関数で上書きする。
Sponsored link
Gaussian
function “abline”
- First argumentに切片、Second argumentに傾きを入れる
- 正規分布を仮定したモデルの場合のみ有効
fit <- lm(Y ~ X, dat1)
plot(Y ~ X, dat1)
# estimated intercept and slope
fit$coefficients
## (Intercept) X ## 0.07065496 0.20422560
# first argument: intercept, second argument: slope
abline(fit$coefficients[1], fit$coefficients[2])
function “lines”
- 説明変数Xの最小値から最大値までの範囲の値を100等分(もしくはより多く)する変数xを作る
- 新しい変数xを基にしたdata.frameを作る
- predict関数で予測値をえる
- lines関数で書き加える
fit <- lm(Y ~ X, dat1)
# set new variable x
x <- seq(min(dat1$X), max(dat1$X), length = 100) # x ranging from min(X) to max(X) with 100 length
x
## [1] 0.0294348 0.1295672 0.2296996 0.3298320 0.4299644 0.5300968 0.6302292 ## [8] 0.7303616 0.8304940 0.9306264 1.0307588 1.1308912 1.2310236 1.3311560 ## [15] 1.4312884 1.5314208 1.6315532 1.7316856 1.8318180 1.9319504 2.0320828 ## [22] 2.1322152 2.2323476 2.3324800 2.4326124 2.5327448 2.6328772 2.7330096 ## [29] 2.8331420 2.9332744 3.0334068 3.1335392 3.2336716 3.3338040 3.4339364 ## [36] 3.5340688 3.6342012 3.7343336 3.8344660 3.9345984 4.0347308 4.1348632 ## [43] 4.2349956 4.3351279 4.4352603 4.5353927 4.6355251 4.7356575 4.8357899 ## [50] 4.9359223 5.0360547 5.1361871 5.2363195 5.3364519 5.4365843 5.5367167 ## [57] 5.6368491 5.7369815 5.8371139 5.9372463 6.0373787 6.1375111 6.2376435 ## [64] 6.3377759 6.4379083 6.5380407 6.6381731 6.7383055 6.8384379 6.9385703 ## [71] 7.0387027 7.1388351 7.2389675 7.3390999 7.4392323 7.5393647 7.6394971 ## [78] 7.7396295 7.8397619 7.9398943 8.0400267 8.1401591 8.2402915 8.3404239 ## [85] 8.4405563 8.5406887 8.6408211 8.7409535 8.8410859 8.9412183 9.0413507 ## [92] 9.1414831 9.2416155 9.3417479 9.4418803 9.5420127 9.6421451 9.7422775 ## [99] 9.8424099 9.9425423
# new data frame with x for prediction
# variable name should be as in the original model (in this case "X", not x)
dat_new <- data.frame(X = x)
# prediction
# intervel specifies "confidence" or "prediction" interval (only available for "lm" function)
# level specifies confindence level (default 95% CI)
pred <- predict(fit, dat_new, interval = "confidence", level = 0.95)
head(pred)
## fit lwr upr ## 1 0.07666630 0.04304971 0.1102829 ## 2 0.09711589 0.06400617 0.1302256 ## 3 0.11756549 0.08495956 0.1501714 ## 4 0.13801509 0.10590976 0.1701204 ## 5 0.15846469 0.12685659 0.1900728 ## 6 0.17891429 0.14779991 0.2100287
# plot
plot(Y ~ X, dat1)
lines(pred[,1] ~ x, lty = 1) # solid line, median prediction
lines(pred[,2] ~ x, lty = 2) # broken line, lower prediction
lines(pred[,3] ~ x, lty = 2) # broken line, upper prediction
Poisson
function “lines”
- 説明変数Xの最小値から最大値までの範囲の値を100等分(もしくはより多く)する変数xを作る
- 新しい変数xを基にしたdata.frameを作る
- predict関数で予測値をえる(予測値はリンクスケールlogであることに注意)
- lines関数で書き加える
fit <- glm(Y ~ X, dat2, family = poisson)
# set new variable x
x <- seq(min(dat1$X), max(dat1$X), length = 100) # x ranging from min(X) to max(X) with 100
# new data frame with x for prediction
# variable name should be as in the original model (in this case "X", not x)
dat_new <- data.frame(X = x)
# prediction
# "type" specifies the scale of predicted values; in this case, link scale
pred <- predict.glm(fit, dat_new, type = "link", se.fit = T)
head(pred$fit); head(pred$se.fit)
## 1 2 3 4 5 6 ## 0.2719658 0.2899878 0.3080098 0.3260318 0.3440539 0.3620759
## 1 2 3 4 5 6 ## 0.1398443 0.1380268 0.1362131 0.1344033 0.1325975 0.1307960
#median prediction (Note: "exp" was applied to predicted values because link function is "log")
pred_mat <- matrix(NA, nrow = 100, ncol = 3)
pred_mat[,1] <- exp(pred$fit) # median
pred_mat[,2] <- exp(pred$fit - pred$se.fit*1.96) #lower 95%CI
pred_mat[,3] <- exp(pred$fit + pred$se.fit*1.96) #upper 95%CI
head(pred_mat)
## [,1] [,2] [,3] ## [1,] 1.312542 0.9978733 1.726438 ## [2,] 1.336411 1.0196457 1.751584 ## [3,] 1.360714 1.0418856 1.777108 ## [4,] 1.385459 1.0646024 1.803019 ## [5,] 1.410655 1.0878059 1.829321 ## [6,] 1.436308 1.1115058 1.856023
# plot
plot(Y ~ X, dat2)
lines(pred_mat[,1] ~ x, lty = 1) # solid line, median prediction
lines(pred_mat[,2] ~ x, lty = 2) # broken line, lower prediction
lines(pred_mat[,3] ~ x, lty = 2) # broken line, upper prediction
Binomial
function “lines”
- 説明変数Xの最小値から最大値までの範囲の値を100等分(もしくはより多く)する変数xを作る
- 新しい変数xを基にしたdata.frameを作る
- predict関数で予測値をえる(予測値はリンクスケールlogitであることに注意)
- lines関数で書き加える
fit <- glm(cbind(Y, N-Y) ~ X, dat3, family = binomial)
# set new variable x
x <- seq(min(dat1$X), max(dat1$X), length = 100) # x ranging from min(X) to max(X) with 100
# new data frame with x for prediction
# variable name should be as in the original model (in this case "X", not x)
dat_new <- data.frame(X = x)
# prediction
# "type" specifies the scale of predicted values; in this case, link scale
pred <- predict.glm(fit, dat_new, type = "link", se.fit = T)
head(pred$fit); head(pred$se.fit)
## 1 2 3 4 5 6 ## -1.700849 -1.657750 -1.614650 -1.571551 -1.528451 -1.485352
## 1 2 3 4 5 6 ## 0.2383530 0.2343346 0.2303395 0.2263692 0.2224248 0.2185077
#median prediction (Note: "ilogit" was applied to predicted values because link function is "logit")
ilogit <- function(x) exp(x)/(1 + exp(x))
pred_mat <- matrix(NA, nrow = 100, ncol = 3)
pred_mat[,1] <- ilogit(pred$fit) # median
pred_mat[,2] <- ilogit(pred$fit - pred$se.fit*1.96) #lower 95%CI
pred_mat[,3] <- ilogit(pred$fit + pred$se.fit*1.96) #upper 95%CI
head(pred_mat)
## [,1] [,2] [,3] ## [1,] 0.1543544 0.1026592 0.2255384 ## [2,] 0.1600643 0.1074511 0.2317503 ## [3,] 0.1659440 0.1124341 0.2380891 ## [4,] 0.1719954 0.1176127 0.2445549 ## [5,] 0.1782204 0.1229913 0.2511481 ## [6,] 0.1846205 0.1285740 0.2578685
# plot
plot(I(Y/N) ~ X, dat3)
lines(pred_mat[,1] ~ x, lty = 1) # solid line, median prediction
lines(pred_mat[,2] ~ x, lty = 2) # broken line, lower prediction
lines(pred_mat[,3] ~ x, lty = 2) # broken line, upper prediction
Related Posts
Sponsored link