How I helped run an Instagram Contest with artoo.js
Over the last few weeks, my wife ran a gear giveaway on our CurrentlyWandering Instagram account. I saw a chance to help her out with a little code hacking, and dove in. My code examples here are a bit hacky, but this was only used for a short time and wasn’t worth the time investment.
Backstory
Last year Jess was involved in an Instagram contest, and I wrote some code to tally qualifying users and randomly pick a winner. Between then and now, Instagram made significant changes to their API and the former approach would not have worked on the short time schedule I had to get app approval. The only way to get this done in the timeframe I had was some client-side scraping magic.
Enter Artoo.js
I’ve used Artoo.js for scraping projects before, and I was pleased. It was an easy choice. Using Artoo.js clientside is as easy as using a bookmarklet. After adding that magic to your bookmarks bar, one click will load Artoo.js onto the page and make it available for use. Loading the Chrome Developer Tools (F12) will show you that it is loaded and give you a place to enter your custom commands.
Loading the Comments
By default, Instagram doesn’t show you all the comments when you load an Instagram post. They show you a selection of comments, and then load an additional group of comments every time you click the ‘load more comments’ link. What I needed to do was keep clicking that link until it disappeared, and then all the comments would be loaded into the page.
Luckily, Artoo.js has an autoExpand
function built just for this purpose. First, the code I used to load comments:
//expand comments
artoo.autoExpand({
expand: function($) {
$('ul._mo9iw>li:eq(1)>button').simulate('click');
},
canExpand: 'ul._mo9iw>li:eq(1)>button',
isExpanding: function($) {
return $('ul._mo9iw>li:eq(1)>div._jf5s3').is(':visible');
},
throttle: 5000,
done: function() {
console.log('Done expanding every comment!');
}
});
I should mention that the odd css class names you see in my code (_mo9iw
, _jf5s3
) are the result of Instagram using a css compiler. Those classes are very likely to change, but didn’t change during the two weeks I need it to work. Finding the right CSS classes is an easy task. Right click the thing you want classes for, and choose ‘inspect’ from the context menu. The element path complete with CSS classes will be displayed in the developer tools.
I copied and pasted the above code into the console of the developer tools. When the done message was printed, I knew I was ready for scraping.
For more details about the configuration options I used, consult the Artoo.js documentation
Comment Scraping
My next task was to turn a pile of comments displayed in HTML into a data structure that I could use.
//scrape comments
all_comments = artoo.scrape({
iterator: "ul._mo9iw>li:gt(0)",
data: {
user: {sel: 'a', attr: 'title'},
comment: function($) {
return $('span', this).clone().children().remove().end().text();
},
mentions: {sel: 'span>a', method:'text'}
},
params: {
done: function(result){}
}
});
The CSS selector used for the iterator looks similar to the selectors used for expanding, but I needed to skip the original post description that was the first element returned for the given selector. Using :gt(0)
did the trick.
After this code completes, the all_comments
array is full of objects that follow the form specified in the data
key of the configuration. This is mostly vanilla and explained well in the Artoo.js documentation.
The one wild piece of code was in the comment
attribute. In this case, the <span>
element selected had some internal span comments wrapped around any @mentions. I wanted to strip the mentions out, and resoted to jqery specific antics to make that happen. Without those specific needs, you can stick with the more simple selector syntax.
Filtering
I wanted to quickly remove the comments made by us as we responded to people in the comments. Those comments were not relevant for tallying up contest entries. Note that you can filter by anything here, like the presence of a word or hashtag.
//filter out currentlywandering comments
filtered_comments = all_comments.filter(function(c){
return c.user != 'currentlywandering';
});
Contest Entry Processing
Next, I grouped all the comments by user, and looked to see if their comment was long enough, and if they mentioned somebody else to gain an extra entry. For convenience, I also remembered the user comments.
//construct dict of username:entrycount
users = {};
filtered_comments.forEach(function(c){
// newly found user
if(users.hasOwnProperty(c.user) == false){
users[c.user] = {
user: c.user,
hascomment: false,
hasmention: false,
entries: 0,
comments: []
};
}
var user = users[c.user];
// calculate entries
if(c.comment.length > 10){
user.hascomment = true;
}
if(c.mentions.length > 3){
user.hasmention = true;
}
if(user.hascomment && user.hasmention){
user.entries = 2;
} else if(user.hascomment || user.hasmention) {
user.entries = 1;
} else {
user.entries = 0;
}
//store comments
user.comments.push(c);
});
Now, I get the entries ready for picking a winner. I construct a list, and put the user’s username in once per entry. Think of the entry_list
array like a hat. Each entry is one slip of paper, and we’ll randomly pull one out as a winner. Those with two entries have a better chance of being pulled out.
//construct array of entries by count
entry_list = [];
for(var username in users){
var u = users[username];
for(i=0;i<u.entries;i++){
entry_list.push(u.user);
}
}
entry_count = entry_list.length;
console.log("comment count", filtered_comments.length);
console.log("unique users", Object.keys(users).length);
console.log("entry count", entry_count);
I print out some stats that we wanted to know. The biggest surprise is how many poeple only had one entry.
Picking a Winner
Finally, the step we’ve been waiting for.
//chose random entry, display username and comments.
var winning_user = entry_list[Math.floor(Math.random()*entry_list.length)];
console.log("winner", winning_user, users[winning_user]);
There were a few things to verify for winners that I couldn’t easily do in the code here. We had to verify that they followed the sponsors, for example. I would pick a winner, and then we would verify that by hand. If the user didn’t qualify, we would pick another winner. Another time, we had a Canadian win a contest with only US shipping.
And That’s All He Wrote
The contest worked, and I saved Jess a heap of trouble trying to do this by hand. I’d prefer good API access to screen scraping for this sort of thing, but it’s nice to have good tools when scraping is the last option. Next time you need to scrape, consider giving Artoo.js a try.