Hi Kevin, thanks for the suggestion. I think I found the problem, because my code is a chained map / reduce. In the previous iteration there is a .lzo_deflate output which is 40 times larger than other files. That was because of a special "key" value, which has significant larger occurrences than other keys. I used a self-defined partitioner: public int getPartition(TextPair key, Text value, int numPartitions) { return (key.getFirst().hashCode() & Integer.MAX_VALUE) % numPartitions; } maybe all -- Shi Yu
↧