All the Cereal You’d Want

One of the first things I remember about college was walking into the cafeteria for the first time and seeing the row of cereal dispensers. The college stocked any cereal you’d want, from the healthy ones your parents would buy for you to the really unhealthy ones, with their unnaturally colored blobs of sugar.

Even though it was lunchtime and the hot bar was admirably staffed with smiling people really to hand me a well-cooked meal, I wanted cereal. I walked up to the dispensers and picked the most brightly-colored one I could see. I spun the little handle on the machine and it doled out one serving of the stuff. I picked up my bowl and started to step down to the milk.

A girl that had been queued behind me called after me, “You can get more cereal than that if you want.” I paused, thought about it, then stepped back and spun the handle once more and got another pile of cereal. She was right, they didn’t care how much you poured.

Twitter to Flickr, and Back Again

View from the front porch

I just got back from a week in Hawaii and only took my film camera. I shot 5 or 6 rolls of film and I look forward to getting them up on Flickr but there are many steps between those 35mm canisters still sitting in the bag I need to unpack and someone being able to click a link to look at them. (Unless everyone wants to come over to my house, which is also fine with me.) I think this is worth the wait but it does remove a bit of the instantaneousness that services like Flickr and Twitter offer. It’s fun sitting on a palm-covered beach or enjoying a tropical drink on a warm patio with a slow-moving fan, taking a picture, and send a modern-day, wish-you-were-here postcard to a few friends.

Today, the snapshot app of choice among my friends appears to be Instagram. This is perfectly fine and I use it a bit, but I’m a Flickr man and I’d rather use that, especially since the rest of my Hawaii photos will go there. It’s nice to make a big set of all of the vacation photos, and be able to email the link off to Mom and Pop, and even nicer to be able to see them again together in 1 year, 5 years, 25 years time. As far as Flickr goes, I feel pretty good about their thoughts about longevity.

The postcard delivery system, to extend (and strain) the metaphor, is Twitter. Twitter is probably the best spot to put things where people will see them sooner than later. Instagram comes equipped with its own social network but Twitter is the common stomping ground of me and my friends and acquaintances.

My Twitter social graph visualized by Recollect
Bert and Chris built a thing for Recollect mapping your Twitter social clusters. Also, Meghan told me to put more pictures in my blog posts.

The crux of the problem was how to get my photos directly to Twitter and Flickr without building a Rube Goldberg device, because things with fewer moving parts break less and are easier for me to understand.

Going through a multi-step process, especially when I’m on the go (it’s mobile!) and when I’m trying to enjoy my surroundings (it’s social!), sounds horrible. I want one app and to be able to hit one button. Flickr does have a mobile app, which is serviceable, but I usually already have Twitter up and most Twitter clients have this nice ability to take pictures within the app. With my phone, I’m usually sending a tweet with a photo attached, and not a Flickr photo that I also want to share on Twitter. Twitter to me is the Instant, which is usually what I want when on the go.

Twitter is, in its fundamental glory, a magic word distribution system (via Kellan, via Aaron). Most Twitter clients allow you do media-webby things like upload a video or a picture to a service of your choosing, get a link back in return, and then helpfully include that link for you in the tweet. This outsourced-upload thing uses what is formally called OAuth Echo. This is described here and seems to have originally been thought of by Raffi Krikorian of Twitter.

Magic word distribution system

Flickr is not one of the upload options, but things like cloudapp, droplr, pikchur, twitgoo are (at least in Tweetbot). I’ll take most of the blame for Flickr not being included as it was something that I was working on in my side time towards the end of my tenure, but didn’t finish before I left. One service does handle this handshake of Twitter-to-Flickr, gdzl.la, but it returns the link back as a gdzl.la link, effectively introducing one more URL forwarder into the world (which is a shitty thing to do if you can help it).

Twitter for iPhone used to support different image backends but probably took it out shortly after they built their own image upload thing. So there’s that.

So, after a week of wanting to take pictures with my phone and send them along to Twitter, and having to choose between Instagram (look at all those filters!) and Twitter’s Official Image Backend™ (store up to 3200 pictures!), I decided to build my own Twitter-to-Flickr uploader atop Aaron’s parallel-flickr of which I also run an instance of for myself.

In theory, this OAuth Echo upload stuff could live by itself (see gdzl.la) and there’s no reason that I couldn’t return my parallel-flickr’s instance URL but there’s something nice about saying “here’s my Flickr kit”, playing along with the aforementioned idea of fewer moving parts as well as knowing all the archival bits-and-pieces going on. Using flamework and the pieces of p-flickr that were already there, a few cups of coffee and a chunk of quiet time, I was able to bolt it on.

One important thing that made this possible is that these 3rd party clients, knowing that they are building things that the official client won’t or can’t build WANT you to build more things to fill the gaps. Tweetbot, at the end of the list of the dozen or so included image backends, has a field marked “Custom…” which takes a field to put in an URL endpoint that knows the steps to the OAuth Echo dance. This kind of allowance and permission is refreshing as things become increasingly less so.

Love it when things say
No, YOU drive.

It looks like Aaron merged this change and the upload branch (which made my part really easy) this morning so if you’re running parallel-flickr, feel free to kick the tires on it, and if not, look at the code and see how easy it is do.

Growing up with Guns

I have no interest in guns. I've never owned anything more powerful than a BB gun. I've never been hunting. Growing up, I was more of an exception to the norm — people around me loved guns.

My father owned a double-barreled, break-action 12-gauge shotgun propped up in the closet with buckshot shells in a nearby dresser drawer. Part of the physical education course we all had to take in high school involved a hunter's safety section that culminated by skeet-shooting on the football field with real rifles with real ammunition. People would call out of work or school on the beginning of hunting seasons.

My brother is not like me in that he loves guns. He collects them and now has a small arsenal consisting of handguns, a shotgun or two, and a few rifles. He has a concealed carry permit, meaning that he has a legal right to carry a handgun on his person. One of the guns he owns, I bought for him, meaning I've gone through distinctly American process of purchasing a weapon.


It was the Christmas after I graduated college and had started to make a little money. I wanted to buy larger gifts for my immediate family than my former college student budget could afford, so I decided to buy my brother a shotgun. I walked into a Dick's Sporting Goods, straight back to the gun counter and got the rundown on the various models in my given price range.

At the counter, I got the various specs of all manner of weaponry: muzzle velocities, magazine counts, and bore sizes. The sales pitch was analytical: scientific and all numbers, vaguely militaristic. Like most things where we don't want to acknowledge the true nature of, this specific jargon was a disguised way of asking, “If I were to point and shoot this gun at something, how big of a hole will it leave? What kind of damage can I, as a small human, do to something?”

The gun I selected was a Mossberg 12-gauge, pump-action shotgun: a black, all-metal, military-looking affair. The man behind the counter took my driver's license to the computer, entered in my information, and then a few minutes later gave me a ticket to take to the front registers in order to pay for the gun. I was a bit confused by the process so I asked him to clarify. He said, “For safety reasons, we don't want our customers carrying guns and ammo around the store.” That seemed perfectly reasonable.

He walked up to the front counter with me, carrying the gun and ammunition. He waited until I had paid, then, while standing not 6 feet away from the register, he handed me the gun and wished me a good night. So here I was, standing in a crowded store, near registers overflowing with Christmas money, holding a very powerful weapon. I was shaking from nervousness, not because I wasn't doing anything wrong, but from the complete disjointedness of holding a gun in a public place and how it all just felt wrong.

Getting a gun was too easy and I had to prove too little about my skills and mental capacity to be an owner of that weapon. For comparison, in North Carolina, in order to get my full driver's license, I had to take a multi-week written course, followed by a multi-week driving course, with a multi-month probationary period, followed by another written course and one more in-car test, just in order to legally drive a car. I probably waited longer in line at the DMV than it took for me to go from non-gun-owner to gun-owner. Our bureaucratic government showers every piece of its workings with red tape yet, for some reason, makes it simple to acquire something that is so closely associated with crime, civil unrest, and some of the worst massacres outside of acts of war on American soil.

As an American, from a gun-loving area of the country, do I know why Americans are so weird about guns? No, not really. I honestly believe that the overwhelming majority of gun owners I've met are well-trained and take proper precautions (Dad with the shotgun in the closet not withstanding). But it's the crazy people that I worry about. And people aren't always crazy. Some perfectly sane, say-hi-at-the-market people have minds that turn and need professional help. Combine this with how easy it is to acquire guns with a society that has choked down that visceral reaction to weaponry, and in fact celebrates it (either through movies that glorify outlaws or war, or as some twisted symbol of citizenry in contentious political times), and the ingredients are there for the terrible things like the past few months happening.

Caleb

In middle school, there was a boy named Caleb who would routinely eat things off the cafeteria floor, claiming it helped his immune system. He never got sick so maybe he was right.

During lunch one day, another boy reached for a chicken nugget off Caleb's hard plastic cafeteria tray. Mid-reach, Caleb stabbed the other boy in the hand with his fork, causing a bit of bleeding and a lot of bruising. To Caleb, this was hilarious.

A few of us would go over to Caleb's house where we pretended to be professional wrestlers and play paintball, at the same time. I never got the connection between Hulk Hogan and shooting balls of paint at one another from way too close range, but the others had no problem making that leap. The whole spectacle probably had something to do with our endocrine systems just starting the flood of testosterone, pushing us as children towards manhood, and impersonating costumed, steroid abusers prancing around in their underwear on TV, combined with shooting each other in pretend-war was, in our hormonally-hazed minds, what Men did. Caleb had the best paintball gun, had the best aim with his good gun, and also knew all the catchphrases and signature moves of every wrestler. He dominated this “game” and I was glad to be included in this make-believe world.

Caleb moved to South Carolina during middle school and years went by and we forgot about him.

During our senior year of high school, a bunch of us skipped class on a day we dubbed “Senior Skip Day” (for obvious reasons). Senior Skip Day not-so-coincidentally fell on the opening day of trout season. Stone Mountain State Park, with its many rivers and creeks flowing through it, was a 15-minute drive from the high school, so the bulk of the kids skipping ended up here, with their trucks and fishing gear in tow. The teachers knew about it but didn't stop it, partly because I think they were jealous but mostly because they were glad to be rid of us for the day.

We were walking along one of the rivers, and we noticed a few people lazily floating by in inner tubes. One of the men in the tubes yelled up at us and out of the river came Caleb.

There was a lot to catch up on in the time spanning shooting each other with paintballs when we were 11 years old and nearly-grown men finishing up high school, too much, in fact, to exchange anything but the high-level details. Indeed he had moved to South Carolina and was currently working in construction (as he had dropped out of school). Behind the short, awkward sentences, I could still tell Caleb had a wildness about him, but, at that age, wildness was less paintball guns and re-enacting fake wrestling matches, and more something else entirely.

Without saying it, we both seemed to decide that our friendship and commonalities were from a different time, so we politely said our goodbyes as he pushed back off into the river and that was the last I ever saw Caleb, just floating away.

Moderation Amplification

A few nights ago, I came across Tom Coates' post titled Social whitelisting with OpenID… about how to handle moderating an online forum when the amount that needs to be moderated outweighs any one person or small group of administrators' capabilities (due to time, sanity, etc). This post was written in early 2007 but this system which advocates building a web of trust between your friends was never built as far as I know.

It did remind me though of a moderation tool that me and couple other tech folks designed and built right around this exact same time for a TV station's website overhaul.

One golden rule of the web is “don't read the comments”, because, as nearly an absolute rule it brings out the worst in the worst people. On a news site, this is triple the case. In polite company, religion, politics, and money are things to tread carefully around even with close friends, but with the veil of anonymity that the Internet provides combined with a website that deals daily in stories of religion, politics, and money that often directly impact the reader, it's a damned near-perfect recipe to attract explosive, hateful, and irrational comments.

As part of this TV site's redesign, there was to be more of focus on contributions from the viewers, which were in the form of photo and video submissions, guest blogging, a simple “wall” for members to leave comments for one another, and, of course, comments on news articles.

As a hard rule, beside every piece of member-added content, we always put a link where anyone could report abusive content. This would go into a moderation queue in our administrative tools and our editors could act on it, either marking it as abusive or marking it as “seen but okay”. For a news site, with millions of visitors a day, this system wasn't manageable, as there would be as many false positives and there would be valid abuse. For some, it was abusive if someone disagreed with them and they couldn't find a way to logically defend their argument. This was overwhelming for a tiny editorial staff.

So, we had to devise something to at least bubble up the true offenders in the moderation queue. (Now, this was 5-6 long years ago and I'm sure it's evolved since I left, but here's how I vaguely remember it working.) The idea that we came up with was this: a member reporting abuse is right if we agree with their judgment, and people that report abuse “correctly” more often build up a certain amount of us trusting their judgment. If someone reported abuse 100 times and a 100 times we agree with them, there's a really good chance that their next report will be correct as well.

So we assigned every user a starting trust score and for every time they reported abuse that we deemed valid, we'd bump up their trust score. On every abuse report, we'd look at the trust score of the person that reported it and if it met some threshold, we'd silently remove it from the site. Their abuse report would still exist in the system, but there was less of a time pressure to go through the abuse queue as after awhile, a small army of reliable reporters would be moderating the site for us.

On the flip side, if an abuse report was deemed wrong, their score would be drastically reduced, halved if I remember correctly. We were fine with the penalty being so severe as good users would build themselves back up and introducing a little chaos into this system was nice as different editors would have slightly different standards, and a lot of the time judging whether something was abuse or not was a judgment call. Chaos was inherent in the system from the start.

These scores were obviously secret, shown only to the editors, and I honestly can't remember if we were actually doing the silent removals by the time I left, but I do think those reports at least got priority in the moderation queue and when going through thousands of reports, this was incredibly helpful.

I like to view this kind of system as sort of an ad-hoc Bayesian filter where your moderation efforts are amplified, rewarding and ultimately giving some weight to people that moderate like you do.


So, the social whitelist begins with allowing a subset of users to post, while the trust score model involves dynamically building a list of good moderators that agree with you on what is abusive content.

I still love the idea of social whitelisting or building up a set of trusted users to help you with moderating, as both are more organic approaches to moderating, meaning that it's forcing you as a someone in charge of a community to actually make decisions about what kind of discourse you want on your site.

This is also why it saddens me a bit today as more and more blogs are just dropping in web-wide, generic commenting systems, like Facebook's. While it is enabling almost everyone to be able to quickly log in and start adding comments, it's horrible for the site owners that are trying to build an intimate community. Every decent community probably has a baseline standard of what's acceptable: no hate speech, no physical threats, no illegal content, etc. This is what Facebook provides — a baseline — and nothing more.

Any community worth moderating is nuanced, has a voice and a direction. Facebook doesn't offer this, so every blogger that drops in this commenting system is making that trade-off between ease of user engagement and being able to effectively manage a community. I'd like to see more sites go back to these more hands-on and thinking-hard approaches to how to moderate and direct their communities instead of relying on someone else's standards of what constitutes a good contribution.

Thoughts on Pagination

A common navigation pattern on websites is what I call “chunked pagination”: each page has a predetermined number of pieces of content on them. Page 1 shows items 1-10, page 2 shows 11-20, and so on to the end of the stream. Easy.

This pattern, though incredibly common, isn’t useful for navigation most of the time. The major issue is that it gives few hints about where those links will drop you in the stream. It’s an arbitrary chunking of how you actually display the content.

Pagination should provide accurate navigation points that reflect the overall ordering of the stream, and pagination based around fixed-length pages provide nothing more than arbitrary access into this ordering, where we have to use estimation and instinct about the distribution of the content in order to make a guess of where a link will send us.

Having a pagination scheme that closely models how a stream is sorted can give you both the casual browsing experience that the numbered pagination provides, as well as powerful navigation abilities that the numbered pagination can’t provide.


For example, take your average photo site that displays the content in a reverse chronological order: that is, newest to oldest. Let’s say your friend has posted 2000 photos to this site. The site shows the viewer 10 things per page. With our prolific user, this gives us 200 pages.

Going to the middle of this content takes us to page 100. What does this mean, beyond we’re at the middle? Not much.

Let’s say this user posted their first photo to the site years ago, but has just gotten back from a month-long trek through Europe where they took a thousand pictures. So our page numbers monotonically march back 10 by 10, but as we know this stream is sorted by date, and we want to go back in time on their photostream to a dinner you shared 6 months ago, we’ll just have to guess which page to start with.

Since we know our friend’s usual posting velocity, we think that ten pages should take us a few months back in time, so we go to page 10. On page 10, our friend is in Europe, looking at the River Seine, just 1 week ago. Let’s go back 10 more pages. Hm, our friend is still in Europe, admiring the beach at the French Riviera. This is frustrating, so let’s try 40 more pages. Click. Damn, our friend is still in Europe (good for him, but bad for your navigation).

After some clicking, we’ve got them figured out. We know that page 100 is right before their trip started, so our estimation applies again, disregarding the first 100 pages, we quickly find the dinner pictures.

But, now that our friend is back from their trip, they resume their normal posting volume of 2 or 3 things per week. So, after our friend is back from his trip for a month or two, the first two pages cover a few weeks, while the next 100 pages cover four weeks. The concept of “page 100” no longer means anything, as this link is very much a moving target as things keep getting posted to the beginning of the stream.

Lots of problems with the page numbers, it seems. First, we have to guess at our friend’s posting volume and frequency to even make a stab in the dark of how a particular page number relates to a point in time. Then, in an ideal world where URLs are meaningful, the page numbered link (to use Flickr as an example, http://www.flickr.com/photos/nolancaudill/page7/ is the seventh page of my photostream) doesn’t point to any specific resource, beyond that it points roughly to the 126th through the 142nd photos I’ve posted and will point to different photos on my next upload. This link is only “valid” as long as the dataset doesn’t change (which datasets tend to do).


Let’s look at the two main ways that people navigate a sorted list.

The first, as we discussed, is by seeking. You know where something happened in a particular sorting (by date, in this example) and you want to get to it. Page numbers do let us narrow it down, but it’s usually a guessing game. Go too far back and you have to click back. Not far enough? Click further. Repeat these steps until you narrow down onto the correct page.

The second way people navigate is by browsing, where there’s not a specific goal in mind beyond seeing some stuff. For this method, the page numbers are also not necessary. You either want to view the next page, or just jump to some point further in the list. For the former, a simple “next” link is adequate, and for the latter, you can provide this same action but in a way that makes sense for this case, but also for the “seekers”.

The way to represent this that satisfies both the seekers and the browsers is to have pagination that actually chunks based around how the list is sorted.

Some examples of this in real-life: dictionaries with the tabs on the page edges that show the alphabet, calendars (one page per month), and encyclopedia sets that have one (or more) books per letter. Dictionaries are actually a good representation of the problem of uneven distribution of the content, as the ‘T’ and ‘I’ sections are much thicker than the ‘X’ and ‘Z’ sections. Jumping halfway into a dictionary doesn’t mean much at all, but having those handy tabs on the edges give you a good head start. Also, even if lots of words are added or removed, jumping to the ‘J’ tab will always take you to the first ‘J’ word, regardless of changes in the vocabulary.

On the web, blogs usually have both the seeking and browsing navigation controls. WordPress, for example, has the ‘next’ and ‘previous’ links at the bottom of the main list views, but usually provides a separate archive page that lists all the posts split out by month. Aaron Cope’s parallel-flickr has an interface that shows all photos uploaded by your friends in the last day. Instead of using pages, the list is divided up by the uploader (signified by their avatars), which is helpful as I have some friends that post one photo at a time, and others that empty their entire memory card in one fell swoop, but I can successfully navigate both cases.

In all these cases, form and function work together nicely, with the pagination links reflecting how the underlying data is actually laid out, making both seekers and browsers happy. It also creates useful links such as linking to “January 2010” in a reverse-date ordered photostream will always be constant, regardless of how the data around it changes.

Since it came up on the Twitters, I should mention the concept of infinite scrolling as a pagination scheme. (Since blech has a protected Twitter account, I don’t want to write out his tweet verbatim, but I can summarize it for sake of context by saying he took issue with an instance of infinite scrolling, which can be easily deduced from my reply). Infinite scrolling is basically a pretty representation of the ‘next’ link that you ‘click’ by scrolling to the bottom of a page. I’ll leave whether or not it’s good user experience to others, but as a purely visual experience, I like it. If it’s the only source of pagination, that sucks,
and another navigation scheme should be provided if having your users be able to look through the list or find something is important.


So to wrap this up, how would one create pagination links for our reverse-date order example, which is an incredibly common view? The obvious way is by actually chunking around dates. I believe people that are much better at designing useful things than myself could adapt this into the same form that our current paginate-by-arbitrary-chunk format occupies. I think adapting the links you usually see in an archive view could be represented in a succint form that makes both seeking and browsing easy operations.

The Front Line

So, Yahoo messed up today. They've messed up other days, too, but this was an especially red-letter day amongst other red-letter days, and this is one that has me ticked off.

For reasons I don't know, Yahoo laid off the highest level of Flickr's customer support, the people that end up filing bugs against the developers and helping the trickier cases get solved for the members. Those guys getting shown the door is as bad as it sounds.

When sites get larger, both in members and staff, the gap tends to grow between the people that build the site and the people that use it. Sometimes this happens with product decisions, but it almost always seems to happen with developers. Our job is to write and ship code to the best of our abilities, though, through no acts of spite or laziness on our parts, our code is not perfect. It's a fundamental nature of current software. We're human, we're imperfect, and we write bugs.

After we write and ship code that probably contains a bug or two (or three), our job is to then write more code, which will also contain bugs. It's a bad cycle.

This means that someone has to be in the middle, as the face of Flickr, acknowledging these mistakes and going to great lengths to fix things. This is often a thankless job, as users just want their problems to go away and developers (usually) don't like to be told they messed up. But they do it for the good, and for the love, of the site. Every bug that gets filed and every support case that gets carefully answered makes the site that much better.

After being a liaison between these two worlds long enough, you end up knowing more than anyone else on the team. When you have millions and millions of users that hit every button and link in combinations you would never dream of, then reporting the “interesting” outcomes of their explorations, these support agents become walking encyclopedias of the ins-and-outs of the site and with Flickr, there are odd edge cases waiting on every page. Having people on your team aware of everything the site does is huge. You literally can't buy that or replace it or outsource it, though it appears that Yahoo thinks it can.

With big sites, not only do you have bugs, but you have outages. These same agents that can recite all the guestpass-viewing conditions and know offhand whether a photo should be visible in Germany, also get to sit on the front lines and explain to users with emotions ranging from impatient to pissed-off that some section of the site will be back as soon as possible. This is not a position to be envied but one they always handled with grace and aplomb.

To be constantly deluged by the requests and demands from stressed users and keep showing up in high spirits day after day demands a special kind of character. Not only do you have the patience of a saint (imagine getting asked the same 3 questions, 50 times a day, every day) but also the tact to work with developers and product folks whose priorities are different from the users, as those things tend to go.

And that's probably the biggest thing that hurts: the users of Flickr lost their major advocates today. At product meetings and developer meetings, it would be these support folks constantly asking, “But what about the users?”


On a personal note, Flickr lost several good people today. If you had me name the top 10 Flickr employees that loved the site the most, half of them got handed pink slips today. Working with that entire team was absolutely one of the highlights of my time at Flickr and any other company that has a need for calm, intelligent, and resourceful customer support folks would do well to contact me or any other person that has ties to Flickr to get you an introduction.

To the support folk that are now ex-Flickr, you've got a stupidly strong alumni organization and we know how good you guys are — we're here for you.


I don't really know the real purpose of me writing this. I'm always hesitant to write anything good, bad, or otherwise about my past employers, but this one deserves to get called out. Yahoo made a major mistake today and there's no other way to interpret it. I'm mad and this is my soapbox.

Flickr-the-site will be fine but Flickr-the-culture took a huge hit today and those suits in Sunnyvale balancing some column or doing their thousandth “re-org” are completely to blame. I bet they don't even know what they've done and that's probably the worst part of the whole thing.

The Code Behind the Yearbook

This is lifted from the README to the github repository.

The Yearbook

I decided to take all the blog posts, Twitter messages, and Flickr images I made this year, combine them, typeset them, and then get it printed
in a hard-bound book. I wrote a bit about the reasoning here.

There was a lot of poking and pawing at the scripts I used to create the final product so I thought I’d share them in case someone else could get some use out of them.

Big warning: these are mostly worthless until you change them to fit your project. While all the code here works, and it ended up giving me a decent-looking book, you’ll need to modify it, which is mostly the point. This is your retrospective and thus shouldn’t be a cookie-cutter running of the code I wrote (if that would even work).

I’ll now explain a bit about the pieces:

The Blog Posts

All my blog posts are just flat HTML (via jekyll) so getting my blog onto my PC was already done. You’ll probably need to run some magic incantation of wget or curl to get yours if they’re hosted somewhere else.

TeX, specifically pdflatex, was the workhorse on typesetting it so I needed to get these HTML files into tex format. I ran a find . -name "*html" | xargs -I{} python texify.py {} in my jekyll’s site directory which then ran each of the files through pandoc. Pandoc is a super magic text transformation library that will slurp in most text format and then spit out a transformed version. In this case, I was reading HTML and spitting out .tex files. You can see the command in texify.py.

After I had all these converted tex files, I actually loaded all my files up in vim, made a macro that cleaned out things like header and footer, and then just ran the macro across all the open files. I forgot this magic spell almost as soon as I did it, but bufdo sounds familiar. I’d google something like “vim macro across all open buffers” or something.

Now that I have a directory full of tex files, one file per blog post, you need a master tex file that actually describes the full document, as well as the pointers to all the various tex files to include. This is the book.tex file in this repository. This is mine lifted as-is, so this is what the finished result looks like and should give you a good idea of how to put yours together.

TeX is a frustratingly arcane markup language, but it is extremely powerful and can create beautiful documents. It’s worth it, trust me.

I’ve also included a sample blog post tex file. This post includes a couple of images by includegraphics to give you a heads start on that.

Twitter

To format your Twitter posts, you first need the actual Twitter messages. This is actually hard, if not impossible, if you’re especially prolific.

Twitter famously only allows you to fetch your last 3200 messages. This limit is enforced but on the official website and by the API.

I’ve been running tweetnest on my server for a year or so, mainly because I think it’s pretty, but it turned out to do a whizbang job of archiving as well. Surprise, surprise: this was the source of Twitter messages for my book. I just dumped the table to a text file (via mysqldump) and used that as my source file.

Inside of twitter/tweet_transform.php, you’ll see the reading of this file and then spitting out the tex file, separating the messages by month and then by the day.

There are some positively Nolan-specific things in here. All the dates in Tweetnest (and probably Twitter’s API) return a timestamp for each Tweet using seconds since the epoch. If I only tweeted from San Francisco in all of 2011, getting nice dates would have been easy: just set the timezone at the top of the script and then call it a day. But as it turned out, I climbed on and off airplanes at various locations and at different times. You’ll see a block of code that dynamically sets the timezone according to when I was boarding and de-boarding airplanes.

Another sort of fuzzy, human thing I added to this that you may want to be aware of is that I fudged the edges of what constituted a “day”. Instead of a day being midnight to midnight, I grouped tweets on a 4am boundary. Best I could tell, I never tweeted before 4am after waking up, and never tweeted past 4am by staying up from the night before. This way a day is defined as waking up to going asleep (or passing out, some nights).

This script also runs follows some common URL shorteners so you won’t see any bit.ly or goo.gl links in your permanent archive.

The hard part of getting the Twitter section together is actually getting the tweets together, but once you do that, it’s a breeze.

Flickr

I uploaded about 600 pictures to Flickr this year. I really wanted to display every single picture for the sake of completeness but figuring out a way to that visually was difficult.

I ended up going something like Google’s image search. Stephen Woods was also a major source of inspiration for the layout. This layout lets you plop a lot of images on a page and letting them use their natural dimensions to shoulder out more space as needed.

Instead of forcing tex to layout individual images, or individual rows, I figured it would be easier to create an image that represented the full page and then put that on the page, not unlike the old days of people adding <area> tags to full-page images in the early days of the web.

The flickr/justified.php file is what creates these image files and then the flickr.tex file that includes them all.

I used Aaron Cope’s parallel-flickr as the source of the images. This project conveniently creates an easy-to-query database so I could do something like “give me all the images from Jan 1, 2011, to Dec 31, 2011 ordered by date_taken ascending”. I used the output of this query to select the appropriate images in the correct order and rsynced them to my book’s Flickr directory.

There are a few fuzzy parameters that lets you set things like a maximum row height, and how wide your rows are. Feel free to twiddle these knobs as you see fit.

Conclusion

Nothing about this is drop-in-and-run but there are a lot of gotchas that I came across that might help someone else if they ever decide to tackle a project like this.

The Code

Feel free to browse the code at my github repository.