Index
Selenium Not Required
Consider this small example where curl
or Python’s urllib
are sufficient. You want to download 1000 zip files from URI’s having a set pattern.
examplesite.com/download/1.zip
examplesite.com/download/2.zip
examplesite.com/download/3.zip
and so on.
In this case you could write a Bash or a Python script that keeps incrementing an integer, append it to the desired string, fetch the content of the URI and save the content in your desired location. Emulating a browser doesn’t even make sense in this case. It’ll be much slower due to GUI coming into picture, might not work on different versions of Firefox (I’ve faced this issue. Selenium often stops working with new updated versions of Firefox. It’s a good idea to turn off auto update in case you use Selenium often).
Selenium Becomes Necessary
You’ve probably used Selenium to automate tests for your web application or to scrape some data from a few websites. It is also possible to use Selenium to download stuff automatically and save them in your desired location. Selenium might be handy in the following conditions.
- The app uses JavaScript and HTML5 to create a CSV that can then be downloaded.
- Download link is randomly generated by the server. It changes from time to time or there is no recognisable pattern in a bunch of files.
- Some websites try blocking scripts (programmatic access) and only access via browsers (real humans). They have some mechanisms to help them achieve the same.
Pesky Confirmation Dialogs
You can press download buttons using Selenium. But what about those confirmation boxes that appear asking you if you really want to download that particular file? Selenium as of today has no way to interact with those confirmation dialog boxes.
Alternative One
Firefox has various options pertaining to dowloads like default location, alert on complete, auto download etc. You can make system wide changes on Firefox and you might have to fiddle with some advanced settings on Firefox (visit about:config
on your browser to change such settings). So you can tune it such that it doesn’t ask for confirmation nor the location thus avoiding those dialog boxes. The problem with this approach is that your code isn’t going to work on your friend’s system out of the box and these changes are going to effect your browsing experience.
Enter Firefox Profiles
Firefox provides an API to change it’s configuration. We’ll be using Selenium to interact with the API and temporarily use a particular configuration for our purposes. This increases the probability of it working on different computers and also gives you a way to preserve your default settings.
After a decent amount of time spent on StackOverflow and Google and testing various methods given by people I found the following code to work for me :)
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", os.path.join(os.getcwd(), data_directory_path))
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/csv, text/csv, application/pdfss, text/csv, application/excel")
fp.set_preference("browser.download.manager.showAlertOnComplete", False)
driver = webdriver.Firefox(firefox_profile=fp)
# Followed by code to download the stuff you desire