Tech Off Thread

2 posts

Impact of changing block size in Hadoop HDFS:

Back to Forum: Tech Off
  • esantoshu

    Hi,

    Can anybody let me know what is the impact of changing the block size in Hadoop HDFS? Say if I change block size from 64 MB to 128 MB, how does it impact?

    Also, please do let me know what is the default size of block size.

    Thanks,

    Santosh

  • Sven Groot

    There are a number of things that this impacts. Most obviously, a file will have fewer blocks if the block size is larger. This can potentially make it possible for client to read/write more data without interacting with the Namenode, and it also reduces the metadata size of the Namenode, reducing Namenode load (this can be an important consideration for extremely large file systems).

    With fewer blocks, the file may potentially be stored on fewer nodes in total; this can reduce total throughput for parallel access, and make it more difficult for the MapReduce scheduler to schedule data-local tasks.

    When using such a file as input for MapReduce (and not constraining the maximum split size to be smaller than the block size), it will reduce the number of tasks which can decrease overhead. But having fewer, longer tasks also means you may not gain maximum parallelism (if there are fewer tasks than your cluster can run simultaneously), increase the chance of stragglers, and if a task fails, more work needs to be redone. Increasing the amount of data processed per task can also cause additional read/write operations (for example, if a map task changes from having only one spill to having multiple and thus needing a merge at the end).

    Usually, it depends on the input data. If you want to maximize throughput for a very large input file, using very large blocks (128MB or even 256MB) is best. For smaller files, using a smaller block size is better. Note that you can have files with different block sizes on the same file system by changing the dfs.block.size parameter when the file is written, e.g. when uploading using the command line tools: "bin/hadoop fs -put localpath dfspath -Ddfs.block.size=33554432"

    The default block size if none is specified in hdfs-site.xml is 64MB.

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.