Java tip: How to read files quickly | Nadeau Software

:

Technologies: Java 5+

Java has several classes for reading files, with and without buffering, random access, thread safety, and memory mapping. Some of these are much faster than the others. This article benchmarks 13 ways to read bytes from a file and shows which ways are the fastest.

A quick review of file reading classes

Let's quickly run through several ways to open and read a file of bytes in Java. To give us a way to compare them, let's compute a file checksum.

FileInputStream with byte reads

FileInputStream f = new FileInputStream( name );
int b;
long checkSum = 0L;
while ( (b=f.read()) != -1 )
    checkSum += b;

FileInputStream opens a file by name or File object. It's read() method reads byte after byte from the file.

FileInputStream uses synchronization to make it thread-safe.

FileInputStream with byte array reads

FileInputStream f = new FileInputStream( name );
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ( (nRead=f.read( barray, 0, SIZE )) != -1 )
    for ( int i=0; i<nRead; i++ )
        checkSum += barray[i];

FileInputStream does an I/O operation on every read and it synchronizes on all method calls to make it thread-safe. To reduce this overhead, read multiple bytes at once into a buffer array of bytes

BufferedInputStream with byte reads

BufferedInputStream f = new BufferedInputStream(
new FileInputStream( name ) ); int b; long checkSum = 0L; while ( (b=f.read()) != -1 ) checkSum += b;

BufferedInputStream handles FileInputStream buffering for you. It wraps an input stream, creates a large internal byte array (usually 8 kilobytes), and fills it in large reads from the stream. It's read() method gets the next byte from the buffer.

BufferedInputStream uses synchronization to be thread-safe.

BufferedInputStream with byte array reads

BufferedInputStream f = new BufferedInputStream(
new FileInputStream( name ) ); byte[] barray = new byte[SIZE]; long checkSum = 0L; int nRead; while ( (nRead=f.read( barray, 0, SIZE )) != -1 ) for ( int i=0; i<nRead; i++ ) checkSum += barray[i];

BufferedInputStream synchronizes on all method calls to make it thread-safe. To reduce synchronization and method call overhead, make fewer read() calls by reading multiple bytes at a time.

RandomAccessFile with byte reads

RandomAccessFile f = new RandomAccessFile( name );
int b;
long checkSum = 0L;
while ( (b=f.read()) != -1 )
    checkSum += b;

RandomAccessFile opens a file by name or File object. It can read, write, or read and write at your choice of position within the file. Its read() method reads the next byte from the current file position.

RandomAccessFile is thread-safe.

RandomAccessFile with byte array reads

RandomAccessFile f = new RandomAccessFile( name );
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ( (nRead=f.read( barray, 0, SIZE )) != -1 )
    for ( int i=0; i<nRead; i++ )
        checkSum += barray[i];

Like a FileInputStream, RandomAccessFile issues an I/O operation on every read and synchronizes on all method calls to make it thread-safe. To reduce this overhead, make fewer method calls by reading into an array of bytes.

FileChannel with ByteBuffer and byte gets

FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
ByteBuffer bb = ByteBuffer.allocate( SIZE );
long checkSum = 0L;
int nRead;
while ( (nRead=ch.read( bb )) != -1 )
{
    if ( nRead == 0 )
        continue;
    bb.position( 0 );
    bb.limit( nRead );
    while ( bb.hasRemaining() )
        checkSum += bb.get( );
    bb.clear( );
}

FileInputStream and RandomAccessFile both can return a FileChannel for low-level I/O. FileChannel's read() method fills a ByteBuffer created using the allocate() method on ByteBuffer. ByteBuffer's get() method gets the next byte from the buffer.

FileChannel and ByteBuffer are not thread-safe.

FileChannel with ByteBuffer and byte array gets

FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
ByteBuffer bb = ByteBuffer.allocate( BIGSIZE );
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ( (nRead=ch.read( bb )) != -1 )
{
    if ( nRead == 0 )
        continue;
    bb.position( 0 );
    bb.limit( nRead );
    while( bb.hasRemaining( ) )
    {
        nGet = Math.min( bb.remaining( ), SIZE );
        bb.get( barray, 0, nGet );
        for ( int i=0; i<nGet; i++ )
            checkSum += barray[i];
    }
    bb.clear( );
}

To reduce method call overhead, get an array of bytes at a time. The array and ByteBuffer can be different sizes.

FileChannel with array ByteBuffer and byte array access

FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
byte[] barray = new byte[SIZE];
ByteBuffer bb = ByteBuffer.wrap( barray );
long checkSum = 0L;
int nRead;
while ( (nRead=ch.read( bb )) != -1 )
{
    for ( int i=0; i<nRead; i++ )
        checkSum += barray[i];
    bb.clear( );
}

A ByteBuffer created with the allocate() method above uses hidden internal storage. Instead, call wrap() to wrap a ByteBuffer around your own byte array. This lets you access your array directly after each read, reducing method call overhead and data copying.

FileChannel with direct ByteBuffer and byte gets

FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
ByteBuffer bb = ByteBuffer.allocateDirect( SIZE );
long checkSum = 0L;
int nRead;
while ( (nRead=ch.read( bb )) != -1 )
{
    bb.position( 0 );
    bb.limit( nRead );
    while ( bb.hasRemaining() )
        checkSum += bb.get( );
    bb.clear( );
}

A ByteBuffer created with the allocateDirect() method may directly use storage deeper in the JVM or OS. This can reduce copying of data upward into your application's array, saving some overhead.

FileChannel with direct ByteBuffer and byte array gets

FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
ByteBuffer bb = ByteBuffer.allocateDirect( BIGSIZE );
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ( (nRead=ch.read( bb )) != -1 )
{
    if ( nRead == 0 )
        continue;
    bb.position( 0 );
    bb.limit( nRead );
    while( bb.hasRemaining( ) )
    {
        nGet = Math.min( bb.remaining( ), SIZE );
        bb.get( barray, 0, nGet );
        for ( int i=0; i<nGet; i++ )
            checkSum += barray[i];
    }
    bb.clear( );
}

And of course, you can get arrays of bytes to save on method call overhead. The buffer and array sizes can differ.

FileChannel with MappedByteBuffer and byte gets

FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
MappedByteBuffer mb = ch.map( ch.MapMode.READ_ONLY,
    0L, ch.size( ) );
long checkSum = 0L;
while ( mb.hasRemaining( ) )
    checkSum += mb.get( );

FileChannel's map method returns a MappedByteBuffer that memory maps part or all of the file into the address space of the application. This gives more direct access to the file without an intermediate buffer. Call the get() method on MappedByteBuffer to get the next byte.

FileChannel with MappedByteBuffer and byte array gets

FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
MappedByteBuffer mb = ch.map( ch.MapMode.READ_ONLY,
    0L, ch.size( ) );
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nGet;
while( mb.hasRemaining( ) )
{
    nGet = Math.min( mb.remaining( ), SIZE );
    mb.get( barray, 0, nGet );
    for ( int i=0; i<nGet; i++ )
        checkSum += barray[i];
}

And getting arrays of bytes saves method call overhead.

FileReader and BufferedReader

What about FileReader and BufferedReader? Both of these classes read characters, not bytes, so they are not included in these comparisons.

To handle characters, FileReader reads bytes from an internal FileInputStream and decodes them as UTF-8 multi-byte characters. BufferedReader does the same thing by wrapping a Reader and adding an internal buffer. Character decoding takes time, giving worse performance than simpler byte-reading classes.

Benchmarks

All of these approaches were benchmarked reading a 100 Mbyte file from a local disk. To make sure benchmarks included the cost of issuing I/O calls and their OS overhead, I measured wall clock time for the task, not CPU or user time. The results plotted below are from an unloaded Mac with JVM 5, but similar results were found on a Windows PC and JVM 6. The JVM was run with the following options:

java -server -XX:CompileThreshold=2 -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -Xmx1000m Test

The -server option uses larger memory defaults and a parallel garbage collector to improve performance. The -XX:CompileThreshold=2 option forced the JVM to compile methods after just two calls, insuring that compilation occurred early, and not in the middle of the benchmarks. The -XX:+AggressiveOpts option enables fancier optimizations, and -XX:+UseFastAccessorMethods inlines some get() methods. The -Xmx1000m option set a 1Gbyte memory limit so that the largest tests could run easily. Each of these options were tested independently and they all made a difference.

There are three variables to test: the read approach, the internal buffer size (for BufferedInputStream and ByteBuffer), and the read array size. Buffer and array sizes were varied separately and together from 1 byte to 128 Mbytes in size.

Full plot

Color key for the plots

Java read times for different approaches and different buffer sizes

The cliff on the left is from very slow byte-by-byte reads. The peak is near 600 seconds, or 10 minutes! Over a 100 Mbyte file, that's a dismal 10 Mbytes per minute. For comparison, hard disks themselves can transfer data at better than 40 Mbytes per second.

This is what we expect to see. Issuing an I/O request is very expensive. To avoid this expense, make fewer I/O requests by reading an array of bytes at a time. Even a small array of just 64 bytes dramatically improves performance.

To see more, we need to zoom in to the bottom few seconds of the plot.

Zoomed plot

Zoomed in to the lower 10 seconds of the Java read time plot

The curves drops down and level out as we increase the array sizes. By about 8Kbytes the curves flatten. Increasing array sizes further doesn't gain much. This is why BufferedInputStream's default buffer size is 8Kbytes.

Performance levels out into about three groups:

  • At 5 seconds is BufferedInputStream and getting one byte at a time from an internal buffer. Each byte requires a method call, and each method call requires a thread synchronization lock, hurting read performance.
  • At just under 1 second is getting one byte at a time from a FileChannel and a large ByteBuffer. FileChannel doesn't use synchronization locks, but a method call per byte still hurts performance.
  • At about 1/2 second are all of the rest. All of these read arrays of bytes to minimize I/O operations, method calls, and thread synchronization, giving them better performance.

A few approaches stand out at the left side of the plot with curves that drop faster than the others:

  • BufferedInputStream with an internal 8Kbyte buffer and FileChannel with an internal 8Kbyte ByteBuffer do better simply because there is that 8Kbyte buffer behind the reads. The left-shifted curves are really a plotting artifact. They should, perhaps, be plotted based upon the internal read buffer size, not the application's read array size.
  • FileChannel with a MappedByteBuffer does well because there is effectively an internal buffer involved in memory mapping. That buffer is a multiple of the OS page size, which is typically 4Kbytes.

To be sure we've seen everything, let's zoom in even more.

Really zoomed plot

Zoomed in to the lower 1 second of the Java read time plot

Zoomed in this far, the lines vary by about 200ms in the "flat" area above 8Kbytes. Monitoring the JVM's compilation and garbage collection found that neither of these were occurring during the tests. So these variations are due to other overhead within the JVM.

If we have to pick a fastest approach, it would be one of three:

  • FileChannel with a MappedByteBuffer and array reads (the magenta line with round dots that hits a low at 2Kbytes to 4Kbytes).
  • FileChannel with a direct ByteBuffer and array reads (the dark blue line with triangle dots that hits a low at 128Kbytes).
  • FileChannel with a wrapped array ByteBuffer and direct array access (the light blue line with triangle dots that hits a low at 256Kbytes).

All three of these do better than the others by reducing the amount of data copying. They all enable the JVM to read new data into the application's own array without going through multiple intermediate buffers.

Comparison to C

C I/O performance plots added to the Java read time plot

How does Java I/O compare to C I/O? The upper thick line above shows C's buffered I/O (fopen() and fread()), the middle line shows memory mapping (mmap()), and the lower thick line shows unbuffered I/O (open() and read()). All of these C tests were compiled with gcc and the -fast option on a Mac.

C's buffered I/O uses an internal buffer filled by large reads from the file. Bytes from the buffer are copied to an application array on each fread() call. This is similar to FileChannel with a ByteBuffer.

C's unbuffered I/O reads bytes into the application's array without an internal buffer and an extra layer of byte copying. This gives better performance and is a similar approach to FileChannel with a wrapped array ByteBuffer.

C's memory mapping gives more direct access to the OS's paged-in chunks of the file. Memory mapping has more OS overhead, reducing its performance compared to unbuffered I/O. The approach is similar to FileChannel with a MappedByteBuffer.

C's performance is clearly better. But comparisons between C and Java are, perhaps, unfair. Java does array bounds checking, runtime type checking, thread synchronization, garbage collection, and so on. C does not. This lets C go faster, but with a higher risk of crashing if you've got a pointer error.

Conclusions

For the best Java read performance, there are four things to remember:

  • Minimize I/O operations by reading an array at a time, not a byte at a time. An 8Kbyte array is a good size.
  • Minimize method calls by getting data an array at a time, not a byte at a time. Use array indexing to get at bytes in the array.
  • Minimize thread synchronization locks if you don't need thread safety. Either make fewer method calls to a thread-safe class, or use a non-thread-safe class like FileChannel and MappedByteBuffer.
  • Minimize data copying between the JVM/OS, internal buffers, and application arrays. Use FileChannel with memory mapping, or a direct or wrapped array ByteBuffer.

Further reading