September 11, 2012

WebDL: ABC iView and SBS Downloader

Filed under: Technical — Tags: , , , , , — James Bunton @ 7:16 pm

WebDL is a collection of Web TV downloader scripts with a consistent user interface. I’ve previously released these separately, but a while ago I refactored them to share common code and packaged them into a single utility. You can use this interactively or to download any shows matching a glob from a cronjob. Currently supported are ABC iView and SBS OnDemand. I’ll probably add more in the future.

Update 2015-05-24: Please see the Bitbucket project for up to date docs!
Update 2015-05-24: Fixed SBS and Channel 9. Livestreamer is now a required dependency.
Update 2014-07-22: Added notes on version to dependencies.
Update 2013-03-26: The latest version of autograbber.py now accepts a file with a list of patterns instead of taking them from the command line.
Update 2014-02-15: Please see https://bitbucket.org/delx/webdl for bug reports or to post patches.

Dependencies

  • Livestreamer
  • python (2.7, not python 3)
  • python-lxml
  • rtmpdump a1900c3e15
  • ffmpeg / libav

The versions listed above are what I have success using. In particular note that rtmpdump always reports v2.4 even though there have been many binaries built with different bugs and features using that version number. If something doesn’t work, try compiling a new ffmpeg/avconv or rtmpdump to see if it fixes the problem.

Interactive Usage

You can run WebDL interactively to browse categories and episode lists and download TV episodes.

$ ./grabber.py
 1) ABC iView
 2) SBS
 0) Back
Choose> 1
 1) ABC 4 Kids
 2) Arts & Culture
 3) Comedy
 4) Documentary
<snipped>
Choose> 4
 1) ABC Open Series 2012
 2) Art Of Germany
 3) Baby Beauty Queens
 4) Catalyst Series 13
<snipped>
Choose> 4
 1) Catalyst Series 13 Episode 15
 2) Catalyst Series 13 Episode 16
 0) Back
Choose> 1
RTMPDump v2.3
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting ...
INFO: Connected...
Starting download at: 0.000 kB

The bolded parts are what you type. Note that you can go back on any screen by typing “0”. At the list of episodes you can download a single episode by typing one number, or multiple episodes by typing several numbers separated by spaces.

Cron Scripted Usage

I have a shell script which looks something like this, I run it daily from crontab.

# m    h  dom mon dow   command
  0    1   *   *   *     ./autograbber.py /path/to/video-dir/ /path/to/patterns.txt

The patterns.txt file should contain shell-style globs, something like:

ABC iView/*/QI*/*
SBS/Programs/Documentary/*/*

The above will download all episodes of QI from ABC as well as every SBS documentary. Whenever an episode is downloaded it is recorded into downloaded_auto.txt. Even if you move the files somewhere else they will not be redownloaded.

148 comments

Sam says:

Hi Coops, hope you solved this by now(!) but in case anyone else is having similar issues, I strongly recommend ditching macports for homebrew: http://brew.sh/
See also (re: ffmpeg on mavericks): http://sangatpedas.com/20140218/installing-ffmpeg-osx-mavericks/

Thanks delx for keeping this project up to date, it’s a godsend! :)

Mark says:

seriously nice work!

Ian says:

Hi I’ve enjoyed using webdl for ages, it has all worked really well until a few days ago:

$ ./grabber.py
Traceback (most recent call last):
File “./grabber.py”, line 3, in
from common import load_root_node, natural_sort
File “/media/BigDrive/video/webdl/common.py”, line 170
<<<<<<< local
^
IndentationError: expected an indented block

steve says:

Hi
I had a problem with an iView file having slashes in the filename “…20/8/2014” which caused the open(filename) command to fail. I made the following changes to sanitize the filename string:

diff common_old.py common.py

18a19,20
> import string
> import unicodedata
317a320,327
>
> validFilenameChars = “-_.() %s%s” % (string.ascii_letters, string.digits)
>
> def removeDisallowedFilenameChars(filename):
> cleanedFilename = unicodedata.normalize(‘NFKD’, filename).encode(‘ASCII’, ‘ignore’)
> return ”.join(c for c in cleanedFilename if c in validFilenameChars)
>
>
327a338
> filename = removeDisallowedFilenameChars(filename)

Thanks for the code, and I hope you can use this.

James Bunton says:

@steve
Thanks. I’ve pushed a fix to ensure iView HLS filenames are sanitised properly.

johnb says:

Hi delx,

Since the changeover from ffmpeg to avconv I regularly seem to get these conversion errors from ABC, any suggestions?

Press [q] to stop, [?] for help
[NULL @ 0x9d1dd80] non-existing SPS 0 referenced in buffering period
[NULL @ 0x9d1dd80] non-existing SPS 32 referenced in buffering period
[mp4 @ 0x9d1f060] malformated aac bitstream, use -absf aac_adtstoasc
av_interleaved_write_frame(): Operation not permitted
avconv exited with error code: 1
Press return to continue…

James Bunton says:

@johnb, try using a newer version. I’m using v10.2 successfully.

boge says:

Hi, are these scripts still working with iView? My first attempt failed handling a .ts file. Upgrading avconv to latest build didn’t help.

Downloading: Shaun Micallefs MAD AS HELL Series 4 Ep 1.ts
Converting Shaun Micallefs MAD AS HELL Series 4 Ep 1.ts to mp4
avconv version 10.1, Copyright (c) 2000-2014 the Libav developers
built on May 10 2014 18:32:12 with gcc 4.8.2 (Gentoo 4.8.2 p1.3r1, pie-0.5.8r1)
Shaun Micallefs MAD AS HELL Series 4 Ep 1.ts: Invalid data found when processing input
avconv exited with error code: 1

Graham says:

Channel 10 busted?

Choose> 4
………Traceback (most recent call last):
File “grabber.py”, line 55, in
main()
File “grabber.py”, line 35, in main
for n in node.get_children():
File “/webdl/common.py”, line 41, in get_children
self.fill_children()
File “/webdl/brightcove.py”, line 71, in fill_children
items = page[“items”]
KeyError: ‘items’

Graham says:

Ignore previous post – may have been an issue at their server – all good today…

Keir Vaughan-Taylor says:

Just tried webdl – nice

I have noticed there are some programs on SBS that don’t show up.
For example ” The man that saved the world”
http://www.sbs.com.au/ondemand/video/11862595532/the-man-who-saved-the-world
the ondemand video doesn’t show up in the grabber.py menu

It would be nice if the Python code included the ability to give the URL as an argument.
i have some Python skills although there is some time required to understand how this code is working. Perhaps someone could advice the place in the code where and if this would be appropriate.

Jez says:

@Keir: It’s there. Go to “SBS” -> “Programs” -> “SBS ONE” and you’ll find it with the other shows starting with “M”. Right now on mine it’s number 107.

It’s also under “Documentary”, but it’s not obvious how to get to it. You first go to “SBS” -> “Programs” -> “Documentary” and then you need to select “-Latest” to get the full list of shows.

This is all very confusing, of course, but webdl is constrained by how the website is structured.

Mike says:

I’m having a hard time getting this to work. I am able to get a .flv file from SBS but it failed in the conversion to .mp4 due to not finding avconv. How do I tell it where the libav files are? Does it matter if I use the 64-bit versions? The .flv file is not playable with VLC player so I don’t know if the file I got was valid (it was large enough), or would the missed conversion have added the appropriate headers to this file for proper playback? Thanks for any help.

??confused?? says:

need a tutorial or guide to use program

Ken says:

Looks like it simply isn’t working for ABC iView anymore.

I am getting this error for everything:

Invalid data found when processing input

Using latest ffmpeg. Tried earlier versions that used to work with no success.

Ken says:

OK – found the issue. Not what I was expecting. The resulting .ts files from ABC iView have 0d (\r) or return characters inserted periodically.

This is occurring because files are not being explicitly opened as binary (“wb”) only “w”. I suspect on certain platforms this may not cause a problem, but on Windows it results in \r’s being inserted.

So if you update it to be “wb” in the various spots in common.py, it should work.

Certainly works for me!

If I can get my head around bitbucket, will post a patch.

Kelvin Proctor says:

Hi James,

Fantastic set of scripts, greatly appreciated.

I wasn’t sure if here or the bitbucket issues system was the right place to report an issue.

I’ve raised an issue I’m having with ABC4Kids in bitbucket. Details are here: https://bitbucket.org/delx/webdl/issue/15/issues-with-abc4kids

Regards,
Kelvin

Philip says:

I’ve been getting the following error when trying to download SBS programmes for a little over a week now.

Traceback (most recent call last):
File “grabber.py”, line 55, in
main()
File “grabber.py”, line 48, in main
if not n.download():
File “/home/pcls/Software/SBS downloader/webdl/sbs.py”, line 54, in download
return download_urllib(filename, video_url, referrer=SWF_URL)
File “/home/pcls/Software/SBS downloader/webdl/common.py”, line 257, in download_urllib
src = _urlopen(url, referrer)
File “/home/pcls/Software/SBS downloader/webdl/common.py”, line 81, in _urlopen
return urlopener.open(req)
File “/usr/lib/python2.7/urllib2.py”, line 406, in open
response = meth(req, response)
File “/usr/lib/python2.7/urllib2.py”, line 519, in http_response
‘http’, request, response, code, msg, hdrs)
File “/usr/lib/python2.7/urllib2.py”, line 444, in error
return self._call_chain(*args)
File “/usr/lib/python2.7/urllib2.py”, line 378, in _call_chain
result = func(*args)
File “/usr/lib/python2.7/urllib2.py”, line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

In this particular case I was attempting to download Queen Victorias Children S1 Ep2 – A Domestic Tyrant.f4m

Are you able to fix this please? Many thanks.

george says:

hi, i appreciate all the effort but i am really unsure how to properly do all this, i was going to try and download this program before it expired. http://www.sbs.com.au/ondemand/video/253252163821/Two-Laws
but am really confused with all the scripts and stuff!
is there somewhere you could point me for a beginner?

Duncan says:

What versions of avconv or ffmpeg work on OSX with Channel 9. After adding livestreamer channel nine started working. But recently it started acting up. It downloads a TS, but the MP4 runs for a few seconds and stops.

[NULL @ 0x7f9008a99000] Multiple RDBs per frame with CRC is not implemented. Update your Libav version to the newest one from Git. If the problem still occurs, it means that your file has a feature which has not been implemented.
aac_adtstoasc failed for stream 0, codec copy: Not yet implemented in Libav, pat
ches welcome
av_interleaved_write_frame(): Cannot allocate memory
ERROR avconv exited with error code: 1
ERROR Failed to download! Gotham – 1×12 – What the Little Bird Told Him

$ avconv -v
avconv version 11, Copyright (c) 2000-2014 the Libav developers
built on Nov 16 2014 09:39:35 with Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)

Duncan says:

Occasionally a TS file is left behind, and it stops at the same point. Is there something else I need to change ? I have downloaded and upgraded ffmpeg for no improvement. Trying avconv made no difference. I tried running avconf and ffmpeg with a lot of different option for no improvement. I also downloaded and compiled fresh copies of ffmpeg, avconv and libx264 for no improvement.

Has 9Jumpin changed the way it presents the file for streaming ?

James Bunton says:

@Duncan, it looks like new DRM :|

Did you create https://bitbucket.org/delx/webdl/issue/19 ?

Duncan says:

Yes that was me, are there any other downloaders for 9 ?
This is the only one I found, and I only had it working for a few days.

Siko says:

Amazing work – thanks very much.

tested on a mac os x 10.10.3
All the latest udpates from Apple plus all other important libraries installed,

Livestreamer
python 2.7 or 3.2+
pycrypto — Livestreamer needs this for some videos
python-lxml
ffmpeg / libav-tools

ABC Iview menu works, Nine and Ten also works. SBS Won’t even give menu. See the output

SBS doesn’t work on the latest update, see the output message:
Choose> 0
1) ABC iView
2) Nine
3) SBS
4) Ten
0) Back
Choose> 3
Traceback (most recent call last):
File “./grabber.py”, line 61, in
main()
File “./grabber.py”, line 41, in main
for n in node.get_children():
File “/Users/apple/Downloads/SBS/webdl/common.py”, line 50, in get_children
self.fill_children()
File “/Users/apple/Downloads/SBS/webdl/sbs.py”, line 81, in fill_children
menu = grab_json(VIDEO_MENU, 3600, skip_assignment=True)
File “/Users/apple/Downloads/SBS/webdl/common.py”, line 153, in grab_json
doc = json.loads(text)
File “/usr/local/Cellar/python3/3.4.3_2/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py”, line 318, in loads
return _default_decoder.decode(s)
File “/usr/local/Cellar/python3/3.4.3_2/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/decoder.py”, line 346, in decode
raise ValueError(errmsg(“Extra data”, s, end, len(s)))
ValueError: Extra data: line 1 column 31 – line 278 column 1 (char 30 – 32315)

Any fix? I’m no expert – please help ;-)

Ryan says:

Hi,

Any chance you or anyone else reading this could provide more detailed guidance on setting up the crontab.

Ive attempted to do it myself however I’m not sure that its working. Ive created the crontab and also the patterns.txt but I’m getting the following results when running autograbber.py.

username$ cd ‘/Users/username/WebDL/’ && ‘/opt/local/bin/python2.7’ ‘/Users/username/WebDL/autograbber.py’ && echo Exit status: $? && exit 1
Usage: /Users/username/WebDL/autograbber.py destdir patternfile

Is there a particular place that the patterns.txt and crontab should be stored?

Any help would be greatly appreciated

I should probably mention i am using OSX

James Bunton says:

@Ryan, it’s your choice where you want to store these files. From the information you’ve provided it looks like you have not managed to get this to run outside of cron. I’d suggest trying to run autograbber.py just from a regular shell before you try scheduling it to run automatically with cron.

Ryan says:

@James, thanks for getting back to me it doesn’t seem to work just running the shell either.

When i run the shell script i get the following:

MacBook-Air:WebDL username$ . patterns.sh
-bash: 0: command not found

I’ve even tried to run the autograbber.py by itself and get:

cd ‘/Users/username/WebDL/’ && ‘/opt/local/bin/python2.7′ ‘/Users/username/WebDL/autograbber.py’ && echo Exit status: $? && exit 1
Usage: /Users/username/WebDL/autograbber.py destdir patternfile

Here is my setup, maybe you can see something i’ve stuffed up:

Shell script: http://pastebin.com/aebcJ5rG

patterns.txt: http://pastebin.com/dwHKLnmG

I can zip my webDL folder if you like or are you able to send me your working files so i can amend them for my system

Thanks in advance

Ryan says:

@James, Ok so i’ve been doing bit more playing around and by inputting the following into the CLI:

./autograbber.py /Users/username/WebDL /Users/username/WebDL/patterns.txt

What i get back from mail is:

Traceback (most recent call last):
File “/Users/username/WebDL/autograbber.py”, line 3, in
from common import load_root_node
File “/Users/username/WebDL/common.py”, line 7, in
import lxml.etree
ImportError: No module named lxml.etree

James Bunton says:

@Ryan, lxml is a required dependency. You’ll need to install it as well as all the other dependencies listed in the README in order to make webdl work.

Ryan says:

@James Hmm I do have it and all the other dependencies installed, may have to do some playing around and re-install them all. Thanks again I’ll keep you posted

Ryan says:

@James I’ve setup the WEBDL on a windows system to give it a go. I’ve been away from a windows system for a very long time but think i have everything sorted for manual downloading.

Abc and Nine both download fine.

Ten will list shows but as soon as it starts to download it crashes

SBS will download shows but will crash when listing SBS1

Regardless of all that, any suggestions on how to setup the auto grabber on windows?

James Bunton says:

@Ryan, I’ve only ever tried to run it on Linux. If you raise an issue with your Python version and the output from webdl I may be able to help.

https://bitbucket.org/delx/webdl/

AndyW says:

I have just updated to delx-webdl-0d865ca1ccc7 (many thanks James I now have SBS downloady goodness again!). I had to make a few extra steps to get it working on my platform – just putting notes here in case it helps anyone out on Mac OS X 10.6.8.

Installed: ffmpeg, python27, py27-lxml, py27-pip via macports
then: livestreamer, pycrypto via pip-2.7

I had to hard wire #!/opt/local/bin/python2.7 in grabber.py autograbber.py and sbs.py (#!/usr/bin/env python was just using Apples v2.6 as I don’t have call to keep other versions in my env paths).

I also had to hard wire in the path for livestreamer in common.py:
“livestreamer” -> “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/livestreamer”

Ryan says:

@James,

So I managed to get autograbber working like a treat on windows, however just have a few questions.

1. Even though an episode has been added to the .downloaded file particular ones still get downloaded over and over. Have you had the same issue and any suggestions on a fix or a way to comment out those particular episodes?

2. I’m trying to figure out where to put in an exception for ffmpeg to always overwrite existing files. This is mainly for the issue above with autograbber so it doesn’t get stuck waiting for user input.

3. Any progress with Plus7 and those 2 links I added to the issue on the BitBucket?

Thanks in advance

c0utta says:

Hi James,

This is a great plugin and I’ve been using it seamlessly on Debian for a couple of years.

In the last week I’ve been experiencing problems with Channel 10 with the following result:

1) ABC iView
2) Nine
3) SBS
4) Ten
0) Back
Choose> 4
Traceback (most recent call last):
File “grabber.py”, line 61, in
main()
File “grabber.py”, line 41, in main
for n in node.get_children():
File “/root/webdl/common.py”, line 50, in get_children
self.fill_children()
File “/root/webdl/brightcove.py”, line 88, in fill_children
items = page[“items”]
KeyError: ‘items’

I’ve tried over the last week in case it’s a transient problem (I noted above that someone else had the same error but it went away after a while) but this has been a week now.

Any ideas?

James Bunton says:

@c0utta, thanks for the bug report. I have raised https://bitbucket.org/delx/webdl/issues/28/channel-10-brightcove-token-revoked.

I’ll try to resolve it when I have some free time.

James Bunton says:

@ryan

I’m glad you got it to work :)

1. I find this happens frequently with ABC iView. They seem to regularly reupload the same episodes but with slightly different titles. If this is not your problem then please raise an issue with more specific details so I can track it down.

2. If you search common.py for ffmpeg you should be able to add the ‘-y’ option easily enough.

3. No, when I have some time to work on it I’ll update the issue :)

c0utta says:

Hi James,

I notice over on BitBucket that you’ve marked the bug as resolved, but it appears that the downloads now reference rendition.m3u8, within which are more details possibly to do with the resolution of the video.

Your script can no longer reference the .ts file any more, which is a shame.

Any further ideas?

Dan says:

This may be a really stupid question, but have just got webdl running in the last day or so and can’t figure out how to display the program I am after further back up the listing. What I mean is, I go to SBS > Genre > Documentary, which then lists 809 programs. I scroll back up the terminal windows but can only get to ‘301) How to be a Billionaire’. I want to download some programs in the A’s and the C’s but don’t know to either tell wendl to only show the first 200, or get it to display one page at a time like the ond windows dir command that used “/p”. I have looked through the documentation but cannot find anything about this in particular. BTW – aweomse programme!

James Bunton says:

@Dan. Most Linux and OSX terminal software allows you to configure a scrollback buffer size. It sounds like you’re using Windows, I suggest you have a look in the settings for an option like that.

Ryan says:

Hi James,

Had any spare time or luck with fixing Plus7?

SP says:

Thanks. This worked for me for ABC Iview.

Robert Watson says:

Thanks for the great software. I used it months ago, and haven’t use it since. Just used your software again just now – found a problem.

My observation. Your software, for ABC iView specifically, only shows one episode of any one program. Possibly only the oldest episode?

For example using ./grabber.py and following this chain of input, “iView->By Channel->ABC2->The Aliens” gives the output of “The Aliens Series 1 Ep 1” and no other. There are 6 episodes available using http://iview.abc.net.au/programs/aliens/ZW0695A002S00.

Looking further afield, I found this to be true for all other program titles.

Hope it is not something stupid that I am doing.

Thanks for the great software.

Michael says:

I am having issues installing webdl on Linux Mint resulting from my inexperience and trouble reading/understanding the doc.

When installing using pip I get the following responses:

~/webdl $ source .virtualenv/bin/activate
~/webdl $ pip install -r requirements.txt
Downloading/unpacking livestreamer (from -r requirements.txt (line 1))
Downloading livestreamer-1.12.2.tar.gz (430kB): 430kB downloaded
Running setup.py (path:/home/michael/webdl/.virtualenv/build/livestreamer/setup.py) egg_info for package livestreamer

warning: no files found matching ‘AUTHORS’
Downloading/unpacking pycrypto (from -r requirements.txt (line 2))
Downloading pycrypto-2.6.1.tar.gz (446kB): 446kB downloaded
Running setup.py (path:/home/michael/webdl/.virtualenv/build/pycrypto/setup.py) egg_info for package pycrypto

Downloading/unpacking lxml (from -r requirements.txt (line 3))
Downloading lxml-3.8.0.tar.gz (3.8MB): 3.8MB downloaded
Running setup.py (path:/home/michael/webdl/.virtualenv/build/lxml/setup.py) egg_info for package lxml
Building lxml version 3.8.0.
Building without Cython.
ERROR: b’/bin/sh: 1: xslt-config: not found\n’
** make sure the development packages of libxml2 and libxslt are installed **

After the error with building lxml, the program will not continue and reports the absence of this module when running grabber.py.

Any assistance would be appreciated.

Mike

anonymous coward says:

This is awesome. Thank you for making webdl.

kim says:

Do you think you can add 7plus?

Comments are closed.