multi_download {curl} | R Documentation |
Advanced download interface
Description
Download multiple files concurrently, with support for resuming large files.
This function is based on multi_run()
and hence does not error in case any
of the individual requests fail; you should inspect the return value to find
out which of the downloads were completed successfully.
Usage
multi_download(
urls,
destfiles = NULL,
resume = FALSE,
progress = TRUE,
multi_timeout = Inf,
multiplex = TRUE,
...
)
Arguments
urls |
vector with URLs to download. Alternatively it may also be a
list of handle objects that have the |
destfiles |
vector (of equal length as |
resume |
if the file already exists, resume the download. Note that this may change server responses, see details. |
progress |
print download progress information |
multi_timeout |
in seconds, passed to multi_run |
multiplex |
passed to new_pool |
... |
extra handle options passed to each request new_handle |
Details
Upon completion of all requests, this function returns a data frame with results.
The success
column indicates if a request was successfully completed (regardless
of the HTTP status code). If it failed, e.g. due to a networking issue, the error
message is in the error
column. A success
value NA
indicates that the request
was still in progress when the function was interrupted or reached the elapsed
multi_timeout
and perhaps the download can be resumed if the server supports it.
It is also important to inspect the status_code
column to see if any of the
requests were successful but had a non-success HTTP code, and hence the downloaded
file probably contains an error page instead of the requested content.
Note that when you set resume = TRUE
you should expect HTTP-206 or HTTP-416
responses. The latter could indicate that the file was already complete, hence
there was no content left to resume from the server. If you try to resume a file
download but the server does not support this, success if FALSE
and the file
will not be touched. In fact, if we request to a download to be resumed and the
server responds HTTP 200
instead of HTTP 206
, libcurl will error and not
download anything, because this probably means the server did not respect our
range request and is sending us the full file.
About HTTP/2
Availability of HTTP/2 can increase the performance when making many parallel
requests to a server, because HTTP/2 can multiplex many requests over a single
TCP connection. Support for HTTP/2 depends on the version of libcurl
that
your system has, and the TLS back-end that is in use, check curl_version.
For clients or servers without HTTP/2, curl makes at most 6 connections per
host over which it distributes the queued downloads.
On Windows and MacOS you can switch the active TLS backend by setting an
environment variable CURL_SSL_BACKEND
in your ~/.Renviron
file. On Windows you can switch between SecureChannel
(default) and OpenSSL
where only the latter supports HTTP/2. On MacOS you
can use either SecureTransport
or LibreSSL
, the default varies by MacOS
version.
Value
The function returns a data frame with one row for each downloaded file and the following columns:
-
success
if the HTTP request was successfully performed, regardless of the response status code. This isFALSE
in case of a network error, or in case you tried to resume from a server that did not support this. A value ofNA
means the download was interrupted while in progress. -
status_code
the HTTP status code from the request. A successful download is usually200
for full requests or206
for resumed requests. Anything else could indicate that the downloaded file contains an error page instead of the requested content. -
resumefrom
the file size before the request, in case a download was resumed. -
url
final url (after redirects) of the request. -
destfile
downloaded file on disk. -
error
ifsuccess == FALSE
this column contains an error message. -
type
theContent-Type
response header value. -
modified
theLast-Modified
response header value. -
time
total elapsed download time for this file in seconds. -
headers
vector with http response headers for the request.
Examples
## Not run:
# Example: some large files
urls <- sprintf(
"https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-%02d.parquet", 1:12)
res <- multi_download(urls, resume = TRUE) # You can interrupt (ESC) and resume
# Example: revdep checker
# Download all reverse dependencies for the 'curl' package from CRAN:
pkg <- 'curl'
mirror <- 'https://cloud.r-project.org'
db <- available.packages(repos = mirror)
packages <- c(pkg, tools::package_dependencies(pkg, db = db, reverse = TRUE)[[pkg]])
versions <- db[packages,'Version']
urls <- sprintf("%s/src/contrib/%s_%s.tar.gz", mirror, packages, versions)
res <- multi_download(urls)
all.equal(unname(tools::md5sum(res$destfile)), unname(db[packages, 'MD5sum']))
# And then you could use e.g.: tools:::check_packages_in_dir()
# Example: URL checker
pkg_url_checker <- function(dir){
db <- tools:::url_db_from_package_sources(dir)
res <- multi_download(db$URL, rep('/dev/null', nrow(db)), nobody=TRUE)
db$OK <- res$status_code == 200
db
}
# Use a local package source directory
pkg_url_checker(".")
## End(Not run)