File Downloads With Selenium - Mission Impossible? - codecentric AG Blog


When starting to automate acceptance tests that include a web UI, you probably will hit a wall quite quickly: how to verify a document that is available for download against some criteria? If you tried that one you know: doing file downloads automatically seems to be a mission impossible … or … is it really?

Frequent weapon of choice for testing web UIs is Selenium. In the fight “Selenium vs. Download” there are actually two problems, which need to be solved:

  1. File download: The download dialog is native in all browsers and cannot be controled with JavaScript. Bad for Selenium: without the possibility for Selenium to control that dialog, it stays open and the test hangs.
  2. File transfer: Under the assumption that the first problem could be solved, there’s now a second problem: When the Selenium server is not running on the same machine as the test execution, the freshly downloaded file (in case the first problem was successfully resolved) still needs to be transferred between servers.

Approaches to solve the download problem

The problem of file downloads with Selenium can be tackled in various ways.

1. Window automation
The first approach smells like “brute force”: when searching the net for a solution to the problem, you easily end up with suggestions, to control the native window with some window automation software like AutoIt. Means you have to prepare AutoIt such, that it waits for any browser download dialog, the point at which Selenium is giving up, takes control of the window, saves the file, and closes the window. After that Selenium can continue as usual.

This might eventually work, but I found it to be techical overkill. And as it turned out, there was a much simpler solution to the problem.

2. Change the browsers default behaviour
The second possibility is to change the default behaviour of the browser. When clicking on a PDF for example, the browser should not open a dialog and ask the user what to do with the file, but rather save it without comments and questions in a predefined directory. To accomplish that, a file download has to be initiated manually, saved to disk and marked as the default behaviour for these file types from now on.

Well, that could work. You “only” have to assure that all developers, hudson instances, etc. share the same browser profile. And depending on the amount of different file types, that could be some manual work.

3. Direct download
Taking a step back, why do we want to download the file with Selenium in the first place? Wouldn’t it be much cooler, to download the file without Selenium, but rather with wget? You would have solved the second problem as you go. Seems a good idea, since wget is not only available for Linux but also for Windows.

Problem solved? Not quite: what about files, that are not freely accessible? What, when I first need to create some state with Selenium in order to access a generated file? The solution seems ok for public files, but is not applicable for all situations.

Conclusion: download problem
Finally we can conclude, that it is possible to download files, but it’s a piece of work and eventually new tools are necessary. But the first step is to get a working solution in place at all.

What was the other problem again?

Approaches to solve the file transfer problem

Well, admitted, that’s not a real problem. There’s FTP and everybody can do it. Nearly everybody. For our favourite tool for test automation, the Robot Framework, there’s not FTP library yet. So that’d require some quick library hacking, but it shouldn’t be that difficult.

Problem solved

In connection with the approaches 1 and 2 for the first problem, it’d be possible to completely solve the problem:

  • Download files with Selenium and save them to a directory that can be reached with FTP
  • FTP the file to the test execution server
  • Execute the checks against the file.

Phew … that looks like some work for a rather simple problem.

Taking another step back, I cannot get the wget solution our of my head. It looked simple, but was not a complete solution to the problem. How can we make it complete? All we need to do is to tell wget that it should continue from the same spot as Selenium left it. Can we do that? We can!

Final solution

Finally, the solution is simple and elegant, and I have to ask myself, why I have not thought about that earlier — a true indication that this is a simple solution.

How can you teach wget to continue from where Selenium left it? How does a web server know who is requesting a page or document: with the current session! You can pass wget the session ID with cookies and header parameters, so that wget can then access all the same files as the browser in the current Selenium session. Implemented as a keyword, it’s just two lines:

Keyword "Download File"

Download File  [Arguments]  ${COOKIE}  ${URL}  ${FILENAME}
  ${COOKIE_VALUE} =  Call Selenium API  get_cookie_by_name  ${COOKIE}
  Run and Return RC  wget --cookies=on --header "Cookie: ${COOKIE}=${COOKIE_VALUE}" -O ${OUTPUT_DIR}${/}${FILENAME} ${URL}

Download File [Arguments] ${COOKIE} ${URL} ${FILENAME} ${COOKIE_VALUE} = Call Selenium API get_cookie_by_name ${COOKIE} Run and Return RC wget --cookies=on --header "Cookie: ${COOKIE}=${COOKIE_VALUE}" -O ${OUTPUT_DIR}${/}${FILENAME} ${URL}

First a direct call to the Selenium API is made in order to read a certain cookie. The value is stored in a variable. Then in the second step, a new process is started. The keyword “Run and Return RC” waits until the new process finishes, which is the case when the file could be downloaded with wget. To make that happen, you have to have wget in your path somewhere, otherwise the test will fail. With the header parameter “Cookie:” wget will continue in the same session as Selenium, and gains access to the file. Voilà 🙂

The new keyword takes three parameters

  1. COOKIE The cookie that is read via selenium and then passed to wget. Usually this is an indicator for the current session
  2. URL The link to the file that should be downloaded
  3. FILENAME The new file name for the file that is placed in Robots output directory

Usage of the new keyword

The usage of the new keyword is nearly trivial, still here’s an example how to download a PDF, which will be placed as “file.pdf” in the output directory (where also the reports go). The cookie containing the session id is JSESSION.

Download File  JSESSIONID    http://<...>/web/pdf?id=4711  file.pdf

Download File JSESSIONID http://<...>/web/pdf?id=4711 file.pdf

File Downloads With Selenium — Mission Possible!