GrabDuck

Smack My Batch Up : Batch Processing In Drupal 8

:

In my home town of Congleton there is a cafe called Bear Grills, which hosts a food challenge called Bear Grill’s Hibernator Breakfast Challenge.

This beast of a meal is a 3.2kg fried breakfast with a two pint milkshake that must be consumed within an hour and defeats most people who attempt it.

Eating The Hibernator in one go is hard, but splitting it up into 300 little meals spread throughout a week sounds much easier right?

That's just what batch processing does - it takes a large amount of items and breaks them up into smaller chunks so they're easier to handle.

Batch Processing in Drupal

Batch processing is an important aspect of Drupal development that allows large amounts of data to be processed without putting undue stress on the server.

Rather than use a single page load to process lots of data the batch API allows the data to be processed as lots of little page requests.

This means that you can easily run through 10,000 items without using up all of the server resources in a single page load.

Plus, because the batch API uses the queue system you can pick up the batch from where it left off if there was an error in the processing. Previously, you would have to use long page load techniques involving set_time_limit() which also meant that if the browser (or computer) crashed for any reason then things would need to start from the beginning.

Some uses of the batch API in Drupal might be to:

  • Process large numbers of pages to update a field or similar.
  • Run an action on a number of users.
  • Process a file upload from a user that might contain hundreds of items of content.
  • Interact with a third part API and generating content.
  • Load content and push it to a third party API.

When you initiate a batch run you are presented with a page containing a progress bar that fills up as the batch runs. This creates a much nicer experience for the user than running a lengthy page load operation. Whilst the batch progresses, the user is shown both the progress bar and any messages that are generated from the batch running.

If you are familiar with the batch API in Drupal 7 then you will be quite familiar with how it works in Drupal 8.

In fact, little has changed other than the need to create routes, page controllers and form classes that control your batch operations. To setup a batch operation you need to construct an array that contains a number of elements. This array is then passed into the batch_set() function, which generates the needed setup for the batch inside Drupal.

1
2
3
4
5
6
7
8
9
10
11
12
$batch = array(
  'title' => t('Processing Smack My Batch'),
  'operations' => [
    ['smackmybatch_batchfunction', []]
  ],
  'finished' => 'smackmybatch_batch_finished',
  'init_message' => t('Smack Batch is starting.'),
  'progress_message' => t('Processed @current out of @total.'),
  'error_message' => t('Smack My Batch has encountered an error.'),
  'file' => drupal_get_path('module', 'smackmybatch') . '/smackmybatch.batch.inc',
);
batch_set($batch);

Here is a breakdown of the array elements that are commonly passed to batch_set().

  • title (optional): This is the title of the batch and will be shown on the title of the page that runs the batch. This defaults to "Processing".
  • operations (required): Perhaps the most important parameter, this is an array of operations that your batch will process. The list of operations are in the form of an array containing two components. The first component is the function to call and the second is an array of parameters to pass to that function. I will expand on this parameter later in this post.
  • finished (optional): The function to run when the batch processing has finished. This must be an implementation of callback_batch_finished(), which is the default function used if not defined.
  • init_message (optional): The message to show to the user when the batch is starting up. This defaults to "Initializing".
  • progress_message (optional): The message shown to the user whilst the batch is being processed. You'll notice able that we have passed in two parameters to this string, these will be replaced with actual values as the batch progresses. The available placeholders are @current, @remaining, @total, @percentage, @estimate and @elapsed. This defaults to "Completed @current of @total".
  • error_message (optional): The error message to show if the batch encounters an error and can't complete the batch for some reason. This defaults to "An error has occurred".
  • file (optional): It's best practice to store the callback functions defined in this array in their own file. This keeps them separate from anything else that your module might be doing. The file parameter is used to define a file location that stores these functions if they are in a file separate from .module. As the path is relative to the base_path() this parameter should contain drupal_get_path() to get the full path to the file.

There are one or two other parameters that can be included in this array, but the above is enough to get up and running quickly.

How you have setup the batch page is important for the next step here. If you have setup an action in a controller then you'll need to call the batch_process() function, passing the destination page as the single parameter. If you are using a form submit handler to control the batch operation then you can leave this function call out as this is done automatically for you.

1
batch_process('admin/content');

The 'operations' parameter is the most important part of your batch process, but there are two strategies involved in using this parameter. It's worth taking some time to go through these two strategies as they change the way you need to think about the batch process.

Pre-set calls list

This involves knowing in advance how many items you have to process and feeding them all into the operations parameter, so that they are processed one by one.

As an example, let's say we wanted to process a list of all the lowercase English letters. To do this we would generate the letters (in this case using the range() function) and then create the array of function callbacks, one for each letter. This is then fed into the operations array.

1
2
3
4
5
6
7
8
9
10
11
12
$letters = range('a', 'z');
 
$operations = [];
foreach ($letters as $letter) {
  $operations[] = ['smackmybatch_process_letter', [$letter]];
}
 
$batch = array(
  'title' => t('Processing Smack My Batch'),
  'operations' => $operations
);
batch_set($batch);

The resulting callback function can be quite simple here. We are only interested in the current item, which is passed to the function by the batch API. The last parameter in the function is always the current context of the batch. This allows us to keep track of the current item and display the progress to the user.

1
2
3
4
5
6
7
8
9
function smackmybatch_process_letter($letter, &$context) {
  $context['sandbox']['progress']++;
  $context['sandbox']['current_letter'] = $letter;
 
  // Do something with the letter...
 
  $context['message'] = $letter . ' processed.';
  $context['results'][] = $letter;
}

This is sort of a silly example as the callback functions are processed so quickly that it's difficult to actually see the feedback.

Alterable calls list

This takes a slightly different route by setting only a single callback in the operations list. The first time this function is called it will figure out how many items it needs to process and will setup the batch context appropriately. This function can then be called multiple times until the limits are reached.

As an example, lets say we wanted to process a bunch of numbers. In this case we would create the operations array with a single item, which is the callback we are going to use. We send no parameters to the operation in this instance.

1
2
3
4
5
6
7
8
9
$operations = [];
 
$operations[] = ['smackmybatch_process_numbers', []];
 
$batch = array(
  'title' => t('Processing Smack My Batch'),
  'operations' => $operations
);
batch_set($batch);

The callback function would look something like the following. We use the context parameter to store some information about the current batch run and set this up the first time. This 'sandbox' element is only used for the current batch run and is deleted before we are passed onto the finished callback. In the below example we are counting through the numbers from 0 to 100 in increments of 10. The finish parameter of context determines if the batch process should finish or not. If the value is 1 (or is not set at all) then the batch process will finish.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function smackmybatch_process_numbers(&$context) {
  if (empty($context['sandbox'])) {
    $context['sandbox']['progress'] = 0;
    $context['sandbox']['current_number'] = 0;
    $context['sandbox']['max'] = 100;
  }
 
  $numbers = range($context['sandbox']['progress'], $context['sandbox']['progress'] + 10);
 
  foreach ($numbers as $number) {
    $context['results'][] = $number;
    $context['sandbox']['progress']++;
    $context['sandbox']['current_number'] = $number;
    $context['message'] = $number;
  }
 
  if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
    $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
  }
}

This approach is useful if you aren't sure beforehand how many items you are about to process, or maybe the number of items changes as the batch progresses. In this case you would adapt the above code to keep running whilst there are items left to process and stop once it has reached the end of the list. I commonly use this approach when processing CSV files of variable length.

The finish callback

Once the batch has finished (either by running through all of the operation callbacks or by reaching a finish condition) then the finish callback function is called. This function has the following parameters:

  • success: This is a boolean value that states if the batch completed successfully or failed.
  • results: The value of the results item from the context variable used in the batch processing.
  • operations: If the success parameter is false then this is a list of the operations that haven't completed yet. Useful if you want to do something with this information, like write the tasks yet to do to a log file.

A typical finish callback might look something like this. This prints a message showing how many items were processed and prints out all processed items on to the screen.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function smackmybatch_batch_finished($success, $results, $operations) {
  if ($success) {
    $message = \Drupal::translation()->formatPlural(count($results), 'One post processed.', '@count posts processed.');
  }
  else {
    $message = t('Finished with an error.');
  }
 
  drupal_set_message($message);
 
  // Providing data for the redirected page is done through $_SESSION.
  foreach ($results as $result) {
    drupal_set_message(t('Processed @title.', array('@title' => $result)));
  }
}

This isn't a particularly useful finish function as batch processes will normally contain thousands of items in the results list and as such you shouldn't print them all out.

This is just for illustration purposes.

A common approach is to print any error conditions that might have arisen during the processing so that action can be actioned afterwards.

So what's changed in Drupal 8 for the Batch API?

Lots of the internal mechanics have changed in Drupal 8, but the main processing of of the batch run is done by the same few procedural functions from Drupal 7 with a few core Drupal 8 objects involved.

The actual batch processing is all controlled by the new BatchController controller, with a route of '/batch'.

So, when running a batch this page is called over and over again until the batch completes.

The BatchController runs the procedural batch functions and returns a minimal response, which makes sense, as this creates a very minimal footprint to the batch processing. You don't want a lot of processing to be done around your batch operation, you just want the batch operation to run as quickly as it can.

Batch API in the real world

When developing complex operations or dealing with user uploaded files I normally start off with a simple batch API implementation and then expand from there.

Using batches greatly reduces the impact of these complex operations on the server and allows you to adapt to problems without having to start from the beginning.

As a real world example, I recently had to take a user uploaded XML file and create a node from each component in that file, which is usually about 200+ nodes.

As the generation of these nodes required a call to an API the batch processing became extremely important as processing all of this data in a single page load would take several minutes. This also meant that if the API failed for some reason then I could simply adapt to the failure and continue the batch run, returning a list of items that weren't able to be processed at the end of the run.

If you are interested in finding more out about the batch API then a good starting point would be the batch API documentation on Drupal.org.

Upgrade your site to Drupal 8

Want to upgrade to Drupal 8? Talk to our team today.