Automatically downloading TD Newcrest research reports

TD Waterhouse (Canada) has an excellent research arm, TD Newcrest. Every so often, the dense formal prose is dropped for a more informal commentary; it’s an interesting read for investment ideas.

However, I don’t like having to log in to their website to see the reports, and even worse, have my session timeout because I took a while to read a report (max. configurable time is 15 minutes). I also generally like data to be in the same “place” as my other data. So I figured out a way to automatically download the TD Newcrest report PDFs.

Since then, I’ve had to made one minor change that broke the original script. Someone made a small addition that matched the regular expressions I had been using originally. I don’t intend to actively invest anymore (want to divert my time elsewhere), but I thought someone might be interested in seeing what the mess looked like. So without further ado (right after I scrub my personal info from the script…):

 #!/bin/zsh  AUTH_TOKEN="`curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/LogOn?language=E&FromCookieConversion=true' | grep AuthToken | sed -s 's/.*value="\(.*\)".*/\1/'`"  CONNECT_ID=123 PASSWORD=rosebud START_PAGE="`curl -s -b cookies.txt -c cookies.txt https://webbroker32y.tdwaterhouse.ca/LogOnValidate -d "connectID=$CONNECT_ID&connectIDSelect=&idToDelete=&oldPassword=$PASSWORD&newPassword=&changePasswordMode=false&submitButton=Login&remember=of&publicTerminal=false&language=E&AuthToken=${AUTH_TOKEN}&javaScriptEnabled=true" | grep top.location.href | sed 's/.*href=.\(.*\).;.*/\1/'`"  # Need to go through theese pages to get the proper cookies. curl -s -b cookies.txt -c cookies.txt ${START_PAGE} > /dev/null curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/MarketsAndResearch' > /dev/null #curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/SecureSiteTopNav.jsp' > /dev/null #curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/MarketsAndResearchLeftNav.jsp' > /dev/null  RESEARCH_PAGE="`curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/MarketsAndResearchContent.jsp'`"  MAGIC_NO="`echo ${RESEARCH_PAGE}|grep name=.magicno|sed -s 's/.*value=.\(.*\).\/.*/\1/'`" BLOB1="`echo ${RESEARCH_PAGE}|grep name=.Blob1|sed -s 's/.*value=.\(.*\).\/.*/\1/'`" BLOB2="`echo ${RESEARCH_PAGE}|grep name=.Blob2|sed -s 's/.*value=.\(.*\).\/.*/\1/'`" BLOB3="`echo ${RESEARCH_PAGE}|grep name=.Blob3|sed -s 's/.*value=.\(.*\).\/.*/\1/'`"  # This is kinda nasty.  We have to go here to get authenticated, and then # we get redirected to somewhere else where we get the session cookie we # need.  Also, the HTML body of the redirect page isn't correct. curl -L -s -b cookies.txt -c cookies.txt 'https://www.tdcanada.wallst.com/tdw/canada/Decrypt.asp?hideTop=1' -d "magicno=${MAGIC_NO}&Blob1=${BLOB1}&Blob2=${BLOB2}&Blob3=${BLOB3}" > /dev/null  REPORT_MAP="`curl -s -b cookies.txt -c cookies.txt "https://www.tdcanada.wallst.com/tdw/canada/markets/reports.asp" | egrep '[0-9]{2}/[0-9]{2}/[0-9]{2}' | egrep 'value="[A-Z0-9]+"' | cut -d '"' -f 2-3 | sed 's/\(.*\)">\(.*\) $DATE.pdf      EMAIL=foo@foo.com     echo | mutt -a $DATE.pdf -s "TD Newcrest research - $DATE" $EMAIL   fi done  rm -f cookies.txt 

Getting pictures out of FlipAlbum

A couple of friends of mine got married recently, and their photographer sent them their proofs on a CD. He used some product I’d never heard of, FlipAlbum, that places encrypted versions of the JPEGs on the CD, and includes a (gaudy) viewer to see the pictures.

When it comes to computers and being prevented from my natural behaviour (e.g. distributing to friends and family), I get irritable and a little obsessed. After getting an image of the CD, I picked through it, curious what was done. It’s pretty clear that the encryption is reversible – the CD has some encryption keys in a file for the viewer program to use. However, I didn’t have much interest in taking the direct approach to try my hand hacking the code or anything similarly “glamorous”.

I still wanted the pictures though, if only because it was being denied. So I went with a low tech solution.

First, I used Bulent Screen Recorder to record a movie of my desktop. BSR has a useful feature where it records a frame only when the monitored area changes. I killed my windows shell, bb4win, to get an empty blackness on one of my monitors and configured BSR to watch that half of my desktop. I then used FlipAlbum’s automatic full-screen slideshow to display all the pictures. I ran that overnight at a rate of three seconds per image, recording an uncompressed movie. I had to give three seconds instead of one, as BSR needed a couple seconds to switch to a new file when the current AVI file got too big (1GB).

Next, I used mplayer to extract each frame to a JPEG. I actually used a Win32 port of mplayer, because I was too lazy to upload the movies to process on my Linux server. The win32 version didn’t have support for non-lossy image types (e.g. PNG), so I bumped the quality to 100.

The next step was to shave off the watermarks BSR embeds into each frame. Fortunately, the pictures weren’t big enough to fill the whole screen, so the watermarks appeared in the “dead” black areas of the image. I looked a little into what Gimp and IrfanView, the desktop graphic programs I most often use, could do for me… but in the end used ImageMagick. Again, I did it in Windows, using the version that comes with Cygwin.

One little note about ImageMagick (I was using convert and mogrify): I struggled with it a bit, because I was getting a weird result using -shave 50x0 -fuzz 5% -trim. I would end up with pictures that still had empty space to the right and bottom. Using -crop 0x0 had the same effect. I finally figured out that shaving pixels off the left and right side changed the data, but didn’t change ImageMagick’s idea of how big the picture was. Transforming the JPEG in two steps, -shave 50x0 then -fuzz 5% -trim, gave the desired result. In case there are purists wondering, the pictures are only proofs, so encoding them a few extra times doesn’t make much of a difference, especially since the intermediate steps were at 100% quality.

It was fun, and I got to know some useful tools better. From a moral standpoint, I don’t feel it’s wrong: the quality of the original pictures provided are too low to blow up even to a 4″x6″ photo, and the photographer hasn’t provided the digital touchups yet. It’s an unnecessary measure that forces people to burn and distribute CDs. How did this protection measure accomplish anything but waste people’s time taking the long way around?

Follow

Get every new post delivered to your Inbox.