Tag Archives: twitter

python-twitter 1.0 for API 1.1

With Twitter turning of the older API 1.0 today in favour of API 1.1, developers of bots and applications that used the older API need to either upgrade their apps, or they’re die a sad and lonely death.

I have a couple bots written using python-twitter module which broke – thankfully it’s just an easy case of updating the module to version 1.0 (an unfortunate version they should have made it version 1.1 to match Twitter). ;-)

If you’re using RHEL/CentOS/etc, EPEL includes a python-twitter package, but it’s way out of date. Instead, I have RPMs of version 1.0 available for EL5/6 in my repositories. You will want to enable both EPEL and the “amberdms-os” repository before you can install the RPM – EPEL includes a number of Python dependencies I don’t ship myself.

Cuckoo Clock NZ

Having arrived in Sydney, I’m staying with some of Lisa’s relatives who have kindly provided us with a room for a while until we get our own place sorted out.

One of the things they have in their house, is a proper mechanical cuckoo clock, which I find highly amusing every time it pops open and emits chirps. I decided it would be fun to write a twitter cuckoo clock.

It’s pretty simple code-wise, just need to generate a tweet every hour with a cuckoo for each hour on a 12-hour clock and a bit of general sanity checking, such as checking what time the last tweet was posted, so if crond goes nuts it won’t spam the feed.

Behold, the amazingness of the Twitter cuckoo clock.

I decided to make it slightly more interesting, so every time it tweets, there is a 1-in-10 chance of it posting some other message from a list of defined messages, as per the above example.

You can check it out at @cuckooclocknz and you can check out the small bit of Python that powers it on my repos. I was tempted to make some for AU, but I was lazy and just did NZ, since my servers are running in NZ timezone and there’s only one timezone for the whole country unlike AU…

Slowly getting more used to Python coding, I’m not a huge fan yet, there’s some nice things about it, like the enforced indenting structure, but some odd things that throw me after years of PHP and Perl, such as for loops and the stricter type handling that need getting used to.

Twitter Auto Delete

Despite me making a clean break from Twitter earlier this year, I’ve ended up back on it on a casual basis, mostly due to the number of my friends on there who only chat or are only reachable via it. :-(

I decided that this time I’d like to treat Twitter more like an IRC chat room, ie a place to chat casually with friends, but not as a formal permanent record – so I made some tweaks to how I was using it:

  1. Primary interaction with Twitter is via PrplTwtr, a plugin for Pidgin, which makes Twitter act like any other chat room, to avoid the habit of having Twitter open in my browser being an invasive distraction. If friends @reply me or DM me, I get a new IM message notification, but otherwise I can ignore it happily.
  2. I wrote a small script that automatically goes and deletes all my Twitter messages after 24 hours – this is enough time for me to chat comfortably with friends, but makes it hard for outsiders to go and data mine my feed and it’s less of a permanent recordable cached record, or link to my tweets long term.

It’s not a perfect setup, whilst it prevents someone from casually going back and seeing my history and engagements with others, it doesn’t stop someone recording my tweets over an extended period to build up their own data pool about me, and of course I have no way of knowing if when I delete a tweet, if it really disappears from the pool of information that Twitter sells to data miners to use.

But it’s good enough that I can chat with friends and keep up-to-date with their lives without leaving a huge digital footprint for any randoms to trawl through.

There are some auto-deleter services around, but I didn’t trust any of them to not do malicious things with my account (eg spamming their presence), plus I wanted it to delete all my tweets *except* my blog post feed.

I found that there’s a pretty decent Twitter module for Python and decided to use this as an exercise to finally learn some proper Python, something I’ve somewhat avoided for lack of a good learning exercise.

The result is a simple Twitter auto-deleter script that is called by Cron every 4 hours and runs a check and deletes any tweets older than 24 hours – the basics is pretty simple really:

39    # query my user status list
40    mytimeline = api.GetUserTimeline(screen_name=user_name,count=query_quantity,include_rts=True)
41    
42    for status in mytimeline:
43    
44        if re.match("^New Blog Post", status.text):
45            #print "Blog post! No delete wanted"
46            continue
47    
48        if status.created_at_in_seconds < cond_time_before:
49            api.DestroyStatus(status.id)
50       
51            print "Deleting Tweet:"
52            print "- Created At: " + status.created_at
53            print "- Content: " + status.text

Note that with GetUserTimeline, you need to specify include_rts=True as an explicit option, so that it includes anything you’ve retweeted in the timeline returned.

Favorites are special wee critters and require a separate GetFavorites call, I don’t use Favorites, so wanted this delete to remove any favorites created by accidental miss-clicks.

You can check out my source here – if you want to run it on your own server, you’ll need to use your account to setup a dev API key and access tokens etc. And you may want to adjust things like the deletion of favorites or retention of blog posts.

I’ve pondered turning this into a simple web-hosted service for people to use, so if you’re the sort of person who can’t use this script yourself but would like the ability to auto-delete your tweets, let me know and I’ll ponder doing it if there’s interest.

I’m sure Twitter will probably kill off more and more of these API calls in future, but at the moment they’re exposing just enough logic to enable me to do this. :-)

Do note that if you run this on a big account, you will hit the maximum API call limit VERY quickly, hence a configured query quantity limit to restrict how many tweets are loaded per execution – you could get away with several hundred every 60mins if you wanted to delete all your twitter history as fast as possible without actually blowing away the account.

Mozilla Collusion

This week Mozilla released an add-on called Collusion, an experimental extension which shows and graphs how you are being tracked online.

It’s pretty common knowledge how much you get tracked online these days, if you just watch your status bar when loading many popular sites you’ll always see a few brief hits to services such as Google Analytics, but there’s also a lot of tracking down with social networking services and advertisers.

The results are pretty amazing, I took these after turning it on for myself for about 1 day of browsing, every day I check in the graph is even bigger and more amazing.

The web actually starting to look like a web....

As expected, Google is one of the largest trackers around, this will be thanks to the popularity of their Google Analytics service, not to mention all the advertising technology they’ve acquired and built over the years including their acquisition of DoubleClick.

I for one, welcome our new Google overlords and would like to remind them that as a trusted internet celebrity I can be useful for rounding up other sites to work in their code mines.

But even more interesting is the results for social networks. I ran this test whilst logged out of my Twitter account, logged out of LinkedIn and I don’t even have Facebook:

Mark Zuckerberg knows what porn you look at.

Combine 69+ tweets a day & this information and I think Twitter would have a massive trove of data about me on their servers.

Linkedin isn't quite as linked at Facebook or Twitter, but probably has a simular ratio if you consider the userbase size differences.

When you look at this information, you can see why Google+ makes sense for the company to invest in. Google has all the data about your browsing history, but the social networks are one up – they have all your browsing information with the addition of all your daily posts, musings, etc.

With this data advertising can get very, very targeted and it makes sense for Google to want to get in on this to maintain the edge in their business.

It’s yet another reason I’m happy to be off Twitter now, so much less information that can be used by advertisers for me. It’s not that I’m necessarily against targeted advertising, I’d rather see ads for computer parts than for baby clothes, but I’m not that much of a fan of my privacy being so exposed and organisations like Google having a full list of everything I do and visit and being able to profile me so easily.

What will be interesting will be testing how well the tracking holds up once IPv6 becomes popular. On one hand, IPv6 can expose users more, if they’re connecting with a MAC-based address, but on the other hand, could privatise more using IPv6 address randomisation when assigning systems IP addresses.

Why I hate URL shorteners

I’ve used Awstats for years as my website statistics/reporting program of choice – it’s trivial to setup, reliable and works with Apache log files and requires no modification to the website or usage of remote tools (like with Google Analytics).

One of the handy features is the “Links from an external page” display, which is a great way of finding out where sudden bursts of hits are coming from, such as news posts mentioning your website or other bloggers linking back.

Sadly over the past couple of years it’s getting less useful thanks to the horrible wonder that is URL shortening.

URL shorteners have always been a controversial service – whilst they can be a useful way of making some of the internet’s more horrible website URLS usable, they cause a number of long term issues:

  • Centralisation – The internet works best when decentralised, but URL shortening makes a large number of links dependent on a few particular organisations who may or not be around in the future. There’s already been a number of link shortening companies who have closed down killing large numbers of links and there will undoubtedly be more in the future.
  • Link Hiding – Short URLs are a great way to send someone a link and have them open it without realising what content they’re actually about to open. It could be as innocent as a prank for a friend or as bad as malicious malware or scamming websites.
  • Performance – It takes an extra DNS query (or several) to lookup the short URL servers before the actual destination can be looked up. This sounds like a minor issue, but it can add up when on high latency connections (eg mobile) or when connecting to international content on NZ’s wonderful internet and can add up to a number of seconds sometimes.
  • Privacy – a third party can collate large amounts of information about an individual’s browsing history if they have a popular enough URL shortening service.

Of course URL shortening isn’t entirely evil, there’s a few valid use cases where they are acceptable or at least forgivable:

  • Printed materials with URLs on them for manual entry. Nobody likes typing more than they need to, that’s understandable.
  • Quickly sending temporary links to people via IM or email where the full URL breaks due to the client application’s inability to phrase the URL correctly.

Anything other than the above is inexcusable, computers are great at hiding the complexities of large bits of information, there’s no need for your blog, social network or application to use short URLs where there is no human entry factor involved.

Twitter is particularly guilty at abusing short URLs – part of this was originally historic, but when Twitter had the opportunity to fix, they chose to instead contribute further towards the problem.

Back in the early days of Twitter, there was no native URL handling, so in order to fit many links into the maximum tweet size of 140chars, users would use a URL shortener such as the classic tinyurl.com or more recent arrivals such as bit.ly to keep the URL lengths as small as possible.

Twitter later decided to implement their own URL shortening service called t.co and now enforce the re-writing of all URLs posted via Twitter to use t.co links, in a semi-transparent fashion where some/all of the original URL will be shown in the tweet, but the actual hyperlink will always go through t.co.

This change offers some advantage to users in that they were no longer dependent on external providers closing down and breaking all their links, as well as having some security advantages in that Twitter maintain lists of bad URLs (URLs they consider to serve malware or other unwanted content) to help stop the spread of dodgy content.

But it also gave Twitter the ability to track click data to figure out which links users were clicking on, I imagine this information would be highly valuable to advertisers. (Google do a very similar thing with the Google results web pages, where all clicks are first directed through a Google server to track what results users select, before the user is delivered to the requested page).

The now mandatory use of URL shorteners on Twitter has lead to a situation where it’s no longer easy to track which tweets or even, what tweeters, are leading to the source of your hits.

Even more confusingly, the handling of referred URLs is inconsistent depending on the browser/client following the link. The vast majority will log as the short URL version, but some will be smart enough to provide the referred URL *before* the referring took place.

RFC 2616 doesn’t touch on how shortened URLs should be handled when referring and leaves the issue of how 301 redirects should have their referrers handled up to the implementers decision. And their are valid arguments for using the original page vs the short URL as the referrer.

For example, for this tweet I have about 9 visits via http://twitter.com/jethrocarr/status/170112859685126145 and 29 visits via http://t.co/0RJteq3r, which throws out hit-count based ordering of the results:

Got to love Twitter & shortened URLs - most of these relate to tweets, but to which tweets? No easy way to track back.

A much better solution, would have been for twitter to display shortened versions of URLs in the tweet text to meet the 140 char limit, but the actual link href record featuring the full URL – for example, a tweet could have “jethrocarr.com/i-like-…..” as the link text to fit within 140 chars, but the actual href record would be the full “jethrocarr.com/i-like-cake” URL.

Whilst tweets are known as being 140 chars, there’s actually far more information than that sorted about each tweet: location co-ordinates, full URL information, date, time and more, so there is no excuse for Twitter to not be able to retain that URL data – of course, that information has value for them for advertising and tracking purposes, so I wouldn’t expect it to ever go away.

(As a side note, there’s an excellent write up on ReadWriteWeb about the structure of a tweet and associated information)

 

Over all, shortened URLs are just a pain for dealing with and it would be far better if people avoided them as much as possible, essentially if you’re using a short URL and it’s not because a user will be manually typing out content, then you’re doing it wrong.

Also keep in mind that many sites have their own shortish URL variations. For example, this article can be accessed via both date/name and ID number:

https://www.jethrocarr.com/2012/02/26/why-i-hate-url-shorteners
https://www.jethrocarr.com/?p=1453

Many people also run their own private shorteners, quite common with popular sites such as news websites wanting to retain control of the link process and is a much better idea if you plan to have lots of short URLs for your website for a valid reason.