본문 바로가기
5. 자료구조/6) 데이터프레임 | data.frame()

R 데이터프레임에서 원하는 조건의 데이터만 추려내는 방법

by makhimh 2019. 11. 28.
반응형

R 데이터프레임에서 원하는 조건의 데이터만 추려내는 방법



R내장 데이터셋 중에서 CO2를 사용할 것입니다. 먼저 어떤 데이터인지 살펴봅시다. 


> head(CO2,5)

  Plant   Type  Treatment conc uptake

1   Qn1 Quebec nonchilled   95   16.0

2   Qn1 Quebec nonchilled  175   30.4

3   Qn1 Quebec nonchilled  250   34.8

4   Qn1 Quebec nonchilled  350   37.2

5   Qn1 Quebec nonchilled  500   35.3


각각 어떤 종류의 데이터인지 알아봅시다. 


> str(CO2)

Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of  5 variables:

 $ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...

 $ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...

 $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...

 $ conc     : num  95 175 250 350 500 675 1000 95 175 250 ...

 $ uptake   : num  16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...

 - attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant

  .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 

 - attr(*, "outer")=Class 'formula'  language ~Treatment * Type

  .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 

 - attr(*, "labels")=List of 2

  ..$ x: chr "Ambient carbon dioxide concentration"

  ..$ y: chr "CO2 uptake rate"

 - attr(*, "units")=List of 2

  ..$ x: chr "(uL/L)"

  ..$ y: chr "(umol/m^2 s)"


총 5개의 변수(variable) 있고, 데이터의 수는 84개 입니다. 이 중에서 Plant가 Qn1인 데이터만 추려봅시다. 


my_data=CO2

my_data_Qn1=my_data[my_data$Plant=="Qn1",]


> my_data_Qn1

  Plant   Type  Treatment conc uptake

1   Qn1 Quebec nonchilled   95   16.0

2   Qn1 Quebec nonchilled  175   30.4

3   Qn1 Quebec nonchilled  250   34.8

4   Qn1 Quebec nonchilled  350   37.2

5   Qn1 Quebec nonchilled  500   35.3

6   Qn1 Quebec nonchilled  675   39.2

7   Qn1 Quebec nonchilled 1000   39.7


이번에는 Type이 Quebec이고, uptake가 40이상인 데이터를 추려봅시다. 


my_data_Q_ut40u=my_data[(my_data$Type=="Quebec")&(my_data$uptake>=40),]


> my_data_Q_ut40u

   Plant   Type  Treatment conc uptake

11   Qn2 Quebec nonchilled  350   41.8

12   Qn2 Quebec nonchilled  500   40.6

13   Qn2 Quebec nonchilled  675   41.4

14   Qn2 Quebec nonchilled 1000   44.3

17   Qn3 Quebec nonchilled  250   40.3

18   Qn3 Quebec nonchilled  350   42.1

19   Qn3 Quebec nonchilled  500   42.9

20   Qn3 Quebec nonchilled  675   43.9

21   Qn3 Quebec nonchilled 1000   45.5

35   Qc2 Quebec    chilled 1000   42.4

42   Qc3 Quebec    chilled 1000   41.4



이번에는 Type이 Quebec인 Treatment 데이터만 추려봅시다.


my_data_Q_Treat=my_data[my_data$Type=="Quebec","Treatment"]



> my_data_Q_Treat
 [1] nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled
[10] nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled nonchilled
[19] nonchilled nonchilled nonchilled chilled    chilled    chilled    chilled    chilled    chilled  
[28] chilled    chilled    chilled    chilled    chilled    chilled    chilled    chilled    chilled  
[37] chilled    chilled    chilled    chilled    chilled    chilled  
Levels: nonchilled chilled


보기 좋게 열벡터로 출력합시다.


my_data_Q_Treat=as.matrix(my_data_Q_Treat,ncol=1)


> my_data_Q_Treat
      [,1]       
 [1,] "nonchilled"
 [2,] "nonchilled"
 [3,] "nonchilled"
 [4,] "nonchilled"
 [5,] "nonchilled"
 [6,] "nonchilled"
 [7,] "nonchilled"
 [8,] "nonchilled"
 [9,] "nonchilled"
[10,] "nonchilled"
[11,] "nonchilled"
[12,] "nonchilled"
[13,] "nonchilled"
[14,] "nonchilled"
[15,] "nonchilled"
[16,] "nonchilled"
[17,] "nonchilled"
[18,] "nonchilled"
[19,] "nonchilled"
[20,] "nonchilled"
[21,] "nonchilled"
[22,] "chilled"  
[23,] "chilled"  
[24,] "chilled"  
[25,] "chilled"  
[26,] "chilled"  
[27,] "chilled"  
[28,] "chilled"  
[29,] "chilled"  
[30,] "chilled"  
[31,] "chilled"  
[32,] "chilled"  
[33,] "chilled"  
[34,] "chilled"  
[35,] "chilled"  
[36,] "chilled"  
[37,] "chilled"  
[38,] "chilled"  
[39,] "chilled"  
[40,] "chilled"  
[41,] "chilled"  
[42,] "chilled"  


반응형

댓글