TD Waterhouse (Canada) has an excellent research arm, TD Newcrest. Every so often, the dense formal prose is dropped for a more informal commentary; it’s an interesting read for investment ideas.
However, I don’t like having to log in to their website to see the reports, and even worse, have my session timeout because I took a while to read a report (max. configurable time is 15 minutes). I also generally like data to be in the same “place” as my other data. So I figured out a way to automatically download the TD Newcrest report PDFs.
Since then, I’ve had to made one minor change that broke the original script. Someone made a small addition that matched the regular expressions I had been using originally. I don’t intend to actively invest anymore (want to divert my time elsewhere), but I thought someone might be interested in seeing what the mess looked like. So without further ado (right after I scrub my personal info from the script…):
#!/bin/zsh AUTH_TOKEN="`curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/LogOn?language=E&FromCookieConversion=true' | grep AuthToken | sed -s 's/.*value="\(.*\)".*/\1/'`" CONNECT_ID=123 PASSWORD=rosebud START_PAGE="`curl -s -b cookies.txt -c cookies.txt https://webbroker32y.tdwaterhouse.ca/LogOnValidate -d "connectID=$CONNECT_ID&connectIDSelect=&idToDelete=&oldPassword=$PASSWORD&newPassword=&changePasswordMode=false&submitButton=Login&remember=of&publicTerminal=false&language=E&AuthToken=${AUTH_TOKEN}&javaScriptEnabled=true" | grep top.location.href | sed 's/.*href=.\(.*\).;.*/\1/'`" # Need to go through theese pages to get the proper cookies. curl -s -b cookies.txt -c cookies.txt ${START_PAGE} > /dev/null curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/MarketsAndResearch' > /dev/null #curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/SecureSiteTopNav.jsp' > /dev/null #curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/MarketsAndResearchLeftNav.jsp' > /dev/null RESEARCH_PAGE="`curl -s -b cookies.txt -c cookies.txt 'https://webbroker32y.tdwaterhouse.ca/MarketsAndResearchContent.jsp'`" MAGIC_NO="`echo ${RESEARCH_PAGE}|grep name=.magicno|sed -s 's/.*value=.\(.*\).\/.*/\1/'`" BLOB1="`echo ${RESEARCH_PAGE}|grep name=.Blob1|sed -s 's/.*value=.\(.*\).\/.*/\1/'`" BLOB2="`echo ${RESEARCH_PAGE}|grep name=.Blob2|sed -s 's/.*value=.\(.*\).\/.*/\1/'`" BLOB3="`echo ${RESEARCH_PAGE}|grep name=.Blob3|sed -s 's/.*value=.\(.*\).\/.*/\1/'`" # This is kinda nasty. We have to go here to get authenticated, and then # we get redirected to somewhere else where we get the session cookie we # need. Also, the HTML body of the redirect page isn't correct. curl -L -s -b cookies.txt -c cookies.txt 'https://www.tdcanada.wallst.com/tdw/canada/Decrypt.asp?hideTop=1' -d "magicno=${MAGIC_NO}&Blob1=${BLOB1}&Blob2=${BLOB2}&Blob3=${BLOB3}" > /dev/null REPORT_MAP="`curl -s -b cookies.txt -c cookies.txt "https://www.tdcanada.wallst.com/tdw/canada/markets/reports.asp" | egrep '[0-9]{2}/[0-9]{2}/[0-9]{2}' | egrep 'value="[A-Z0-9]+"' | cut -d '"' -f 2-3 | sed 's/\(.*\)">\(.*\) $DATE.pdf EMAIL=foo@foo.com echo | mutt -a $DATE.pdf -s "TD Newcrest research - $DATE" $EMAIL fi done rm -f cookies.txt