4.1 拆分与连接数据文件要掌握的命令
为了折分或合并数据文件,要掌握下面的命令。
image.png
image.png
4.2 案例:拆分与连接数据
4.2.1 横向拆分数据
要完成的任务(任务 4.1)将 mydata 数据分成三个数据文件,分别为学生基本信息文件 student.dta, 经济学成绩文件 economy.dta 和数学成绩文件math.dta。
use male, clear //打开记录男生信息的数据文件 male append using female //将记录女生信息的 female 文件追加到当前数据集中 save mydata1, replace
4.3.2 横向合并数据
---------将学生基本信息和学习成绩合并成新数据 mydata2---------
. use mydata4,clear . keep id economy . save "economy.dta",replace . use economy,clear . sort id . merge id using student (note: you are using old merge syntax; see [D] merge for new syntax) . tab _merge _merge | Freq. Percent Cum. ------------+----------------------------------- 3 | 5 100.00 100.00 ------------+----------------------------------- Total | 5 100.00 . drop _merge
4.4 数据重整
命令 | 命令解释 | 用法示例 |
reshape | 将数据重整 | reshape long inc,i(id) j(yr) |
stack | 将多列数据转换成一列数据 | stack a b c d, into(e f) |
xpose | 数据转置 | xpose, clear |
. input id str10 name math2003 math2004 economy2003 economy2004 id name math2003 math2004 econo~2003 econo~2004 1. 1 John 40 13 68 55 2. . 2 Chris 80 64 52 87 3. . 3 Jack 90 55 76 25 4. . 4 Huang 43 60 90 4 5. . 5 Tom 70 68 96 42 6. . 6 Han 53 10 85 89 7. . 7 Phillip 85 61 36 52 8. . 8 Jin 95 6 65 84 9. 'browse' cannot be read as a number 9'browse' cannot be read as a number 9'browse' cannot be read as a number 9des 'des' cannot be read as a number 9. end . list +----------------------------------------------------------+ | id name math2003 math2004 eco~2003 eco~2004 | |----------------------------------------------------------| 1. | 1 John 40 13 68 55 | 2. | 2 Chris 80 64 52 87 | 3. | 3 Jack 90 55 76 25 | 4. | 4 Huang 43 60 90 4 | 5. | 5 Tom 70 68 96 42 | |----------------------------------------------------------| 6. | 6 Han 53 10 85 89 | 7. | 7 Phillip 85 61 36 52 | 8. | 8 Jin 95 6 65 84 | +----------------------------------------------------------+ . *---------将学习成绩数据集 mywide 变换形式--------- . reshape long math economy, i(id name) j(year) (note: j = 2003 2004) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 8 -> 16 Number of variables 6 -> 5 j variable (2 values) -> year xij variables: math2003 math2004 -> math economy2003 economy2004 -> economy ----------------------------------------------------------------------------- . list +--------------------------------------+ | id name year math economy | |--------------------------------------| 1. | 1 John 2003 40 68 | 2. | 1 John 2004 13 55 | 3. | 2 Chris 2003 80 52 | 4. | 2 Chris 2004 64 87 | 5. | 3 Jack 2003 90 76 | |--------------------------------------| 6. | 3 Jack 2004 55 25 | 7. | 4 Huang 2003 43 90 | 8. | 4 Huang 2004 60 4 | 9. | 5 Tom 2003 70 96 | 10. | 5 Tom 2004 68 42 | |--------------------------------------| 11. | 6 Han 2003 53 85 | 12. | 6 Han 2004 10 89 | 13. | 7 Phillip 2003 85 36 | 14. | 7 Phillip 2004 61 52 | 15. | 8 Jin 2003 95 65 | |--------------------------------------| 16. | 8 Jin 2004 6 84 | +--------------------------------------+ . save mylong, replace (note: file mylong.dta not found) file mylong.dta saved . reshape wide (note: j = 2003 2004) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 16 -> 8 Number of variables 5 -> 6 j variable (2 values) year -> (dropped) xij variables: math -> math2003 math2004 economy -> economy2003 economy2004 ----------------------------------------------------------------------------- . list +----------------------------------------------------------+ | id name math2003 eco~2003 math2004 eco~2004 | |----------------------------------------------------------| 1. | 1 John 40 68 13 55 | 2. | 2 Chris 80 52 64 87 | 3. | 3 Jack 90 76 55 25 | 4. | 4 Huang 43 90 60 4 | 5. | 5 Tom 70 96 68 42 | |----------------------------------------------------------| 6. | 6 Han 53 85 10 89 | 7. | 7 Phillip 85 36 61 52 | 8. | 8 Jin 95 65 6 84 | +----------------------------------------------------------+
4.4.3 案例:多列数据转为少数几列
input v1-v6 v1 v2 v3 v4 v5 v6 1. 21 87 23 97 58 3 2. . 56 46 9 21 79 17 3. . 26 42 59 26 9 29 4. . 94 89 35 16 81 62 5. . 27 5 80 76 26 89 6. . 11 67 10 37 45 48 7. . 40 71 57 37 13 22 8. . 72 69 10 96 22 93 9. stack var1-var6, into(x) clear 'stack' cannot be read as a number 9. end . stack v1-v6,into(x) clear . list +-------------+ | _stack x | |-------------| 1. | 1 21 | 2. | 1 56 | 3. | 1 26 | 4. | 1 94 | 5. | 1 27 | |-------------| 6. | 1 11 | 7. | 1 40 | 8. | 1 72 | 9. | 2 87 | 10. | 2 46 | |-------------| 11. | 2 42 | 12. | 2 89 | 13. | 2 5 | 14. | 2 67 | 15. | 2 71 | |-------------| 16. | 2 69 | 17. | 3 23 | 18. | 3 9 | 19. | 3 59 | 20. | 3 35 | |-------------| 21. | 3 80 | 22. | 3 10 | 23. | 3 57 | 24. | 3 10 | 25. | 4 97 | |-------------| 26. | 4 21 | 27. | 4 26 | 28. | 4 16 | 29. | 4 76 | 30. | 4 37 | |-------------| 31. | 4 37 | 32. | 4 96 | 33. | 5 58 | 34. | 5 79 | 35. | 5 9 | |-------------| 36. | 5 81 | 37. | 5 26 | 38. | 5 45 | 39. | 5 13 | 40. | 5 22 | |-------------| 41. | 6 3 | 42. | 6 17 | 43. | 6 29 | 44. | 6 62 | 45. | 6 89 | |-------------| 46. | 6 48 | 47. | 6 22 | 48. | 6 93 | +-------------+
4.5 案例:数据转置
xpose
. clear . input id math id math 1. 1 68 2. . 2 52 3. . 3 76 4. . 4 90 5. . 5 96 6. . 6 85 7. . 7 36 8. . 8 65 9. end . xpose, clear . list +---------------------------------------+ | v1 v2 v3 v4 v5 v6 v7 v8 | |---------------------------------------| 1. | 1 2 3 4 5 6 7 8 | 2. | 68 52 76 90 96 85 36 65 | +---------------------------------------+
4.6 字符运算
. input str15 x x 1. . "10*123" 2. . "543*21" 3. . "12*422" 4. . "43532*32134" 5. . "4349*1" 6. . end . . gen a=strpos(x,"*") . . gen b=substr(x,1,a-1) . . gen c=substr(x,a+1,.) . dis a 3 . dis b 10 . dis c 123