This was one of the problems I faced in the Import module of Open Event where I had to download media from certain links. When the URL linked to a webpage rather than a binary, I had to not download that file and just keep the link as is.
To solve this, what I did was inspecting the headers of the URL. Headers usually contain a Content-Type parameter which tells us about the type of data the url is linking to. A naive way to do it will be -. It works but is not the optimum way to do so as it involves downloading the file for checking the header.
So if the file is large, this will do nothing but waste bandwidth. I looked into the requests documentation and found a better way to do it. That way involved just fetching the headers of a url before actually downloading it. This allows us to skip downloading files which weren't meant to be downloaded. To restrict download by file size, we can get the filesize from the Content-Length header and then do suitable comparisons. We can parse the url to get the filename.
This will be give the filename in some cases correctly. However, there are times when the filename information is not present in the url. In that case, the Content-Disposition header will contain the filename information. This way, we store it in a variable. The second one is more interesting. It specifies the mode in which we open the file. There are several options in this department.
For instance, the most popular ones are:. It automatically calls the close method at the end. Well, the more eagle-eyed may have noticed that we first received the whole file through the GET request and then we went through its entirety to write it on the hard disk. The main issue with this is that the file is first stored entirely in the RAM before being transferred to the Hard Drive. The RAM is usually not designed for this purpose and this can really slow down the process for bigger files and potentially overflow and crash.
To illustrate this point, we can try to download a sample video file provided by the file-examples. Here is the code:. Active 1 year, 3 months ago. Viewed 25k times. Ivan Vinogradov 3, 6 6 gold badges 25 25 silver badges 28 28 bronze badges. Chaudhry Talha Chaudhry Talha 5, 8 8 gold badges 37 37 silver badges 83 83 bronze badges. Please look at this answer: stackoverflow. Check this you might your answer here. I think IvanVinogradov answered your question.
Proper use of os. Show 1 more comment. Active Oldest Votes. Ivan Vinogradov Ivan Vinogradov 3, 6 6 gold badges 25 25 silver badges 28 28 bronze badges.
So use os os. Or add your own absolute path in your OS pathstyle of choice. This answer just shows an example of handling file downloads with requests. Of course you should use os package to deal file file system — Ivan Vinogradov. You need to create a new folder and save the file in it? Show 4 more comments. Worth noting that urlretrieve is a legacy function from Python 2 and might be deprecated at some point.
0コメント