I may be wrong, but all(?) examples I've seen with Apache Hadoop takes as input a file stored on the local file system (e.g. org.apache.hadoop.examples.Grep)
Is there a way to load and save the data on the Hadoop file system (HDFS)? For example I put a tab delimited file named 'stored.xls' on HDFS using hadoop-0.19.1/bin/hadoop dfs -put ~/local.xls stored.xls
. How should I configure the JobConf to read it ?
Thanks .
-
JobConf conf = new JobConf(getConf(), ...); ... FileInputFormat.setInputPaths(conf, new Path("stored.xls")) ... JobClient.runJob(conf); ...
setInputPaths will do it.
Pierre : Thanks, but it throws an exception saying that "file:/home/me/workspace/HADOOP/stored.xls" (this is a local path) doesn't exist. The file on HDFS is in '/user/me/stored.xls'. I also tried new Path("/user/me/stored.xls") and it doesn't work too.yogman : First off, it's strange that Hadoop complained about "file:" rather than "hdfs:". If might be that your hadoop-site.xml is misconfigured. And, second, if that still doesn't work, mkdir input and put stored.xls in the "input" dir (all with bin/hadoop fs command). And, new Path("input") instead of new Path("stored.xls")yogman : Revealing your command line to run the job wouldn't hurt. -
Pierre, the default configuration for Hadoop is to run in local mode, rather than in distributed mode. You likely need to just modify some configuration in your hadoop-site.xml. It looks like your default filesystem is still localhost, when it should be hdfs://youraddress:yourport. Look at your setting for fs.default.name, and also see the setup help at Michael Noll's blog for more details.
-
FileInputFormat.setInputPaths(conf, new Path("hdfs://hostname:port/user/me/stored.xls"));
This will do
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.