欢迎登陆真网站,您的到来是我们的荣幸。 登陆 注册 忘记密码? ☆设为首页 △加入收藏
欢迎加入真幸福QQ群
电脑知识: 基础知识 网络技术 操作系统 办公软件 电脑维修 电脑安全 windows7 windows8 windows10 服务器教程 平板电脑 视频播放教程 网络应用 互联网 工具软件 浏览器教程 QQ技巧 输入法教程 影视制作 YY教程 wps教程 word教程 Excel教程 PowerPoint
云南西双版纳特产小花糯玉米真空包装


金山重装高手帮您10分钟内安装系统
新浪微博企业集体页如何申请制作?
新浪微博申请企业认证的微博有什么要求?
怎么申请加入苹果ios8.3公测计划?
主板U盘启动另类设置办法
新浪如何查询企业认证进度?
几个步骤为WinPE添加自定义脚本
Windows系统中的F1-F12键有什么功能
Windows系统win键功能说明
频繁对电脑开关机会损坏电脑吗
hive数据怎么导入
【 来源:网络 】【 点击:1 】 【 发布时间:2017_03_03 08:59:59 】

   1.通过外部表导入

  用户在hive上建external表,建表的同时指定hdfs路径,在数据拷贝到指定hdfs路径的同时,也同时完成数据插入external表。

  例如:

  编辑文件test.txt

  $ cat test.txt

  1 hello

  2 world

  3 test

  4 case

  字段之间以't'分割

  启动hive:

  $ hive

  建external表:

  hive> CREATE EXTERNAL TABLE MYTEST(num INT, name STRING)

  > COMMENT 'this is a test'

  > ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'

  > STORED AS TEXTFILE

  > LOCATION '/data/test';

  OK

  Time taken: 0.714 seconds

  hive> show tables;

  OK

  mytest

  partition_test

  partition_test_input

  test

  Time taken: 0.07 seconds

  hive> desc mytest ;

  OK

  num int

  name string

  Time taken: 0.121 seconds|

  数据拷贝到hdfs:

  $ hadoop fs -put test.txt /data/test

  查看hive表数据:

  hive> select * from mytest;

  OK

  1 hello

  2 world

  3 test

  4 case

  Time taken: 0.375 seconds

  hive> select num from mytest;

  Total MapReduce jobs = 1

  Launching Job 1 out of 1

  ......

  Total MapReduce CPU Time Spent: 510 msec

  OK

  1

  2

  3

  4

  Time taken: 27.157 seconds

  这种方式常常用于当hdfs上有一些历史数据,而我们需要在这些数据上做一些hive的操作时使用。这种方式避免了数据拷贝开销

  2.从本地导入

  数据不在hdfs上,直接从本地导入hive表

  文件/home/work/test.txt内容同上

  建表:

  hive> CREATE TABLE MYTEST2(num INT, name STRING)

  > COMMENT 'this is a test2'

  > ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'

  > STORED AS TEXTFILE;

  OK

  Time taken: 0.077 seconds

  导数据入表:

  hive> LOAD DATA LOCAL INPATH '/home/work/test.txt' INTO TABLE MYTEST2;

  Copying data from file:/home/work/test.txt

  Copying file: file:/home/work/test.txt

  Loading data to table default.mytest2

  OK

  Time taken: 0.24 seconds

  查看数据:

  hive> select * from MYTEST2;

  OK

  1 hello

  2 world

  3 test

  4 case

  Time taken: 0.11 seconds

  这种方式导入的本地数据可以是一个文件,一个文件夹或者通配符,需要注意的是,如果是文件夹,文件夹内不能包含子目录,同样,通配符只能通配文件。

  3.从hdfs导入

  上述test.txt文件已经导入/data/test

  则可以使用下述命令直接将数据导入hive表:

  hive> CREATE TABLE MYTEST3(num INT, name STRING)

  > COMMENT "this is a test3"

  > ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'

  > STORED AS TEXTFILE;

  OK

  Time taken: 4.735 seconds

  hive> LOAD DATA INPATH '/data/test/test.txt' INTO TABLE MYTEST3;

  Loading data to table default.mytest3

  OK

  Time taken: 0.337 seconds

  hive> select * from MYTEST3 ;

  OK

  1 hello

  2 world

  3 test

  4 case

  Time taken: 0.227 seconds

  4. 从其它表导入数据:

  hive> CREATE EXTERNAL TABLE MYTEST4(num INT) ;

  OK

  Time taken: 0.091 seconds

  hive> FROM MYTEST3 test3

  > INSERT OVERWRITE TABLE MYTEST4

  > select test3.num where name='world';

  Total MapReduce jobs = 2

  Launching Job 1 out of 2

  Number of reduce tasks is set to 0 since there's no reduce operator

  Starting Job = job_201207230024_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002

  Kill Command = /home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201207230024_0002

  Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

  2012-07-23 18:59:02,365 Stage-1 map = 0%, reduce = 0%

  2012-07-23 18:59:08,417 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:09,435 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:10,445 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:11,455 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:12,470 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:13,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:14,508 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.62 sec

  MapReduce Total cumulative CPU time: 620 msec

  Ended Job = job_201207230024_0002

  Ended Job = -174856900, job is filtered out (removed at runtime).

  Moving data to: hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000

  Loading data to table default.mytest4

  Deleted hdfs://localhost:9000/user/hive/warehouse/mytest4

  Table default.mytest4 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2, raw_data_size: 0]

  1 Rows loaded to mytest4

  MapReduce Jobs Launched:

  Job 0: Map: 1 Accumulative CPU: 0.62 sec HDFS Read: 242 HDFS Write: 2 SUCESS

  Total MapReduce CPU Time Spent: 620 msec

  OK

  Time taken: 30.663 seconds

  hive> select * from mytest4;

  OK

  2

  Time taken: 0.103 seconds

本网站由川南居提供技术支持,fkzxf版权所有 浙ICP备12031891号
淳安分站 淳安分站