The Code Behind the Yearbook

This is lifted from the README to the github repository.

The Yearbook

I decided to take all the blog posts, Twitter messages, and Flickr images I made this year, combine them, typeset them, and then get it printed
in a hard-bound book. I wrote a bit about the reasoning here.

There was a lot of poking and pawing at the scripts I used to create the final product so I thought I’d share them in case someone else could get some use out of them.

Big warning: these are mostly worthless until you change them to fit your project. While all the code here works, and it ended up giving me a decent-looking book, you’ll need to modify it, which is mostly the point. This is your retrospective and thus shouldn’t be a cookie-cutter running of the code I wrote (if that would even work).

I’ll now explain a bit about the pieces:

The Blog Posts

All my blog posts are just flat HTML (via jekyll) so getting my blog onto my PC was already done. You’ll probably need to run some magic incantation of wget or curl to get yours if they’re hosted somewhere else.

TeX, specifically pdflatex, was the workhorse on typesetting it so I needed to get these HTML files into tex format. I ran a find . -name "*html" | xargs -I{} python texify.py {} in my jekyll’s site directory which then ran each of the files through pandoc. Pandoc is a super magic text transformation library that will slurp in most text format and then spit out a transformed version. In this case, I was reading HTML and spitting out .tex files. You can see the command in texify.py.

After I had all these converted tex files, I actually loaded all my files up in vim, made a macro that cleaned out things like header and footer, and then just ran the macro across all the open files. I forgot this magic spell almost as soon as I did it, but bufdo sounds familiar. I’d google something like “vim macro across all open buffers” or something.

Now that I have a directory full of tex files, one file per blog post, you need a master tex file that actually describes the full document, as well as the pointers to all the various tex files to include. This is the book.tex file in this repository. This is mine lifted as-is, so this is what the finished result looks like and should give you a good idea of how to put yours together.

TeX is a frustratingly arcane markup language, but it is extremely powerful and can create beautiful documents. It’s worth it, trust me.

I’ve also included a sample blog post tex file. This post includes a couple of images by includegraphics to give you a heads start on that.

Twitter

To format your Twitter posts, you first need the actual Twitter messages. This is actually hard, if not impossible, if you’re especially prolific.

Twitter famously only allows you to fetch your last 3200 messages. This limit is enforced but on the official website and by the API.

I’ve been running tweetnest on my server for a year or so, mainly because I think it’s pretty, but it turned out to do a whizbang job of archiving as well. Surprise, surprise: this was the source of Twitter messages for my book. I just dumped the table to a text file (via mysqldump) and used that as my source file.

Inside of twitter/tweet_transform.php, you’ll see the reading of this file and then spitting out the tex file, separating the messages by month and then by the day.

There are some positively Nolan-specific things in here. All the dates in Tweetnest (and probably Twitter’s API) return a timestamp for each Tweet using seconds since the epoch. If I only tweeted from San Francisco in all of 2011, getting nice dates would have been easy: just set the timezone at the top of the script and then call it a day. But as it turned out, I climbed on and off airplanes at various locations and at different times. You’ll see a block of code that dynamically sets the timezone according to when I was boarding and de-boarding airplanes.

Another sort of fuzzy, human thing I added to this that you may want to be aware of is that I fudged the edges of what constituted a “day”. Instead of a day being midnight to midnight, I grouped tweets on a 4am boundary. Best I could tell, I never tweeted before 4am after waking up, and never tweeted past 4am by staying up from the night before. This way a day is defined as waking up to going asleep (or passing out, some nights).

This script also runs follows some common URL shorteners so you won’t see any bit.ly or goo.gl links in your permanent archive.

The hard part of getting the Twitter section together is actually getting the tweets together, but once you do that, it’s a breeze.

Flickr

I uploaded about 600 pictures to Flickr this year. I really wanted to display every single picture for the sake of completeness but figuring out a way to that visually was difficult.

I ended up going something like Google’s image search. Stephen Woods was also a major source of inspiration for the layout. This layout lets you plop a lot of images on a page and letting them use their natural dimensions to shoulder out more space as needed.

Instead of forcing tex to layout individual images, or individual rows, I figured it would be easier to create an image that represented the full page and then put that on the page, not unlike the old days of people adding <area> tags to full-page images in the early days of the web.

The flickr/justified.php file is what creates these image files and then the flickr.tex file that includes them all.

I used Aaron Cope’s parallel-flickr as the source of the images. This project conveniently creates an easy-to-query database so I could do something like “give me all the images from Jan 1, 2011, to Dec 31, 2011 ordered by date_taken ascending”. I used the output of this query to select the appropriate images in the correct order and rsynced them to my book’s Flickr directory.

There are a few fuzzy parameters that lets you set things like a maximum row height, and how wide your rows are. Feel free to twiddle these knobs as you see fit.

Conclusion

Nothing about this is drop-in-and-run but there are a lot of gotchas that I came across that might help someone else if they ever decide to tackle a project like this.

The Code

Feel free to browse the code at my github repository.

My mom called to let me know that they had to put Dipstick down today due to the kind of things that happen to cats when they get old. She actually ended up being 14 years old, which was older than I thought she was.

It's a cruel fate that takes away these constant companions, especially ones that lived as quietly as she did, not asking for anymore than a small laundry room, a bowl full of food, and a few pats on the head as we walked by.

So, with that, I'll be hugging Oliver and Lola a bit harder tonight.

They Used to Pour Fire off a Mountaintop

I was on a wiki walk when I came across this fascinating, but long gone, summertime event that used to happen inside Yosemite.

So, they used to pour still-burning embers off the top of Glacier Point at nightfall every day during the summer to make, what they called, a “firefall.”

Firefall - The real one!

This image is from Flickr user Cliff Stone (his real name!) taken in the summer of 1962 before the event was banned.

The firefall began when people down in the valley would see the embers from the nightly bonfire from the Glacier Point Mountain House kicked off the cliff’s face and then people would start specifically asking for it to happen, as seeing a river of fire coming down the face of cliff was quite the spectacle.

There were various attempts to stop the firefall between its inception in 1872 until its demise in 1968, mainly citing the crushing traffic of the sightseers and the fact that it was a man-made event in a setting that the rangers wanted to be celebrated for its natural beauty.

Even President John F. Kennedy saw it one night on a visit to the park while in office, but as he had to finish a phone call, they delayed it from its usual time of 9pm to 9:30pm.

The ritual usually kicked off at 9pm with a call-and-response, with someone in the valley yelling, “Let the fire fall!” with the response from the top of the mountain with, “The fire falls!”

There are so many things I enjoy about this. Don’t get me wrong, if they tried to start this again, I’d be first in line to protest, but in retrospect of it actually happening, it’s fascinating.

I can almost see the kitschy 1950s postcard saying “Come see the great Yosemite Firefall!” and the hordes of middle America in their station wagons parked on the road at nightfall. And it’s so audacious to think that that they used to allow a hotel to dump burning embers off the top of a mountain, especially when park rangers will definitely ticket you in most national parks for having even small campfires.

Now, people come to see a natural version of this, when the late winter February sun strikes Horsetail Falls just right, and sets the flowing water seemingly aflame. I’d like to see this one day and, in a small way, I’m sad I’ll never get to hear the people yelling and then pouring fire off the high peak, purely for amusement.

A New Razor

As a Christmas gift, Meghan gave me a new razor, one of the “old-fashioned” safety razors, complete with a shaving brush and soap that smells vaguely of tobacco.

I had actually been eyeing this setup for a few months as I’ve been using one of the modern razors with 3-5 blades for years, and I was always disappointed with the quality of the shave, as well as how quickly my coarse facial hair would wear down the blade.

the art of shaving

I watched a few YouTube videos on optimal lathering technique with the brush and soap and the proper angle and direction to pull the razor. (I’m sure our grandfathers learned the same way…) And just like any ritual, there are as many ways to do it as there are are people performing it.

So wanting a possibly better shave was a major reason for wanting (and receiving) this shaving setup, but I’d be lying if I said that was the only reason.

Sure, there’s that slightly Mad-Menish flair of dragging a sharp blade across your throat with a chromed-out razor, but for me, it appealed to my love of little rituals.

Up to a year or so ago, the pieces of my day were mostly the same, many of them spent in front of my computer. Like I’d imagine a lot of people that “live” on the Internet, I divided my chunks of time into roughly 15 second slices: read 1 email, switch to Twitter and read the top few most recent messages, switch to the feed reader and skim a dozen headlines, oh look a new Twitter message, ooh more email, and so on and so on. I knew I needed a reprieve from getting that constant stream of endorphins from making all the numbers go down.

The first little ritual I introduced was brewing a cup of coffee and making oatmeal on the stove every morning. Such a little thing became a real meditation. I could only do one thing at a time and I had to pay attention. This was a complete flip from the non-stop information gluttony I usually participated in and this was good.

So this new shaving process in the morning is a similar thing. (And as a side benefit, I’m getting a better shave!)

Another thought, that might be related:

Over the holidays, Meghan and I went and stayed with her parents in a beach house on the Florida Panhandle. Her brother would turn the TV own after our afternoon walks on the beach and around 6pm, Meghan’s mother would ask for him to to change the channel to the news.

Ten years ago, this would have seemed commonplace, but today, the idea of sitting down with the frame of mind of “now, I will consume the news for 30 minutes” is noteworthy. With an always-on, usually-tuned-in Internet connection, there’s no official news time. It’s all the time, whether you like it or not.

Instead of watching the talking heads, I tried to decide which mode is better. With the Internet, I can know of any world event within seconds of it happening. With the TV, I get a daily condensed version of the highlights.

I think I decided that getting the news, in whatever form, once a day in a solid chunk might be the best way. Beyond living in ignorance for a few hours in the day, not knowing normal news events until later usually has very little direct impact on my life.

It’s also another ritual, a devoted time set aside for one purpose. Do I need to be constantly awash in world news? I’d say no. Getting that dose of information in one chunk probably also lets you digest it better and it also lets others (definitely for better or worse) filter out a lot of the noise.

I also realized that a big reason that I would constantly refresh cnn.com or nytimes.com was that I was bored. Why was I bored? Because there was nothing new on those pages. I realized that’s a pretty harsh cycle to be caught up in.

So that’s why I’m taking solace in what little rituals like shaving with a safety razor, or sitting down to watch the evening news, or grinding and pressing a pot of coffee has on my always plugged-in attitude.

And I’m learning that the old saying is true: sometimes you need to stop and smell roses, or, in this case, the tobacco-scented shaving soap.

The Cat in the Laundry Room

My parents have a cat that I cleverly named “Dipstick”, due to her solid blackness with a white-tipped tail.

This cat was born at our house in the laundry room which is an exterior room off the garage. With a washer and dryer in there, it was always warm and with the garage door closed, wild animals weren’t a concern, so it was a safe spot to birth and raise a litter of kittens.

Dipstick was the runt of the lot and was born not breathing. My father was watching over the natural progression of things and quickly swooped in, as naturally, this would have been that for that kitten. He held it upside down and pumped its chest, and fluid poured out, and it started mewing.

We’ve always had multiple cats wandering around the house, and we’ve let them decide if they wanted to be indoor or outdoor cats. We’ve had some cats that would only go outside once a week or so (and less than that in the winter), and some cats that stayed completely outside, only swinging by to eat.

Dipstick was a cat that lived in limbo, not really an indoor nor an outdoor cat, instead deciding to live entirely in the same laundry room she was born in. So where she could have a warm house to sleep in, or dozens of wooded acres to wander through, she has instead decided to spend her entire existence in a 48 square foot room.

Dipstick is now approaching 12 or 13 years old. This cat has spent all but a few minutes per day in this crowded little room. This makes me sad on some level, as she’ll pass on eventually within a few feet of where she was born, but she doesn’t seem to be in bad spirits at all about it, and she knows both what the inside and the outside holds, and she’s chosen her lot.

I’m not sure why I think so much of this cat when I’m home. Now that I’m only coming home once a year or so, things are always slightly different and off from the last time I visited, but that black cat with the white-tipped tail is always curled up in the same spot she’s literally spent her entire life in and that kind of constancy is reassuring, especially when my own travels have taken me very far away from home.

A Sunday Story: 2011-12-11

This past week left me emotionally drained, with the excitement of signing on for my new job and putting my notice in at Flickr. I’ve started to say a few farewells at the office and each one leaves me in a slightly altered state. On top of that, I’ve been fighting the office flu. So this weekend was mostly a collapse.

On Saturday, Meghan and I biked out to Outerlands with Phil Dokas and Traci and had a nice brunch and then went on towards the beach. Traci’s bike got a flat at some point, at which we parted ways.

On Sunday, I got out my sketchbook and started to working through a few exercises in Drawing on the Right Side of the Brain. I wasn’t in the right frame of mind to be working on something where my efforts would come to naught, so I poked at my yearbook a bit, formatting a few of the pages.

At night, we joined Trevor and Camille at Nopalito where we enjoyed hearing about their Disneyland trip and discussing Meghan’s book club.

I’ve already started mentally preparing for next week. It’s an odd feeling when you know your days at work are numbered and you start tallying through the things that you’re going to make your best effort to accomplish. The next week is going to fly by and I need to hunker down and crank.

That’s a Wrap

As of yesterday, I'm on my way out the door at Flickr, as I'll be joining some former Flickr folk and some new faces at Caterina Fake's new startup in short time.

I'm terribly sad about leaving such fantastic people and a product that I truly love, but I'm also incredibly excited about what's next.

Flickr Satellite Office

So I've been working on gathering the bits and pieces of my online life from the past year and getting them together into a book that I'm tenatively calling “Yearbook: 2011” (creative, I know).

I'm not entirely sure why I'm doing this, but I'm enjoying it and the fact that I might have a real, physical thing that I built after it's finished is frankly exciting.

Chapter 2: Twitter

The process, like a lot of a lot of processes where you take something you're familiar with and then turn it on its head, has already changed the way I approach what I put online, or at least my perceived impact of it. Now when I write a Twitter message or a blog post, my first thought is, “Well, that's going in the book,” and I pause, realizing that this will be a concrete thing soon.

So, I have this website, with the occasional blurb or essay, and my Twitter account where I post mostly inane one-liners (as that's all I've got room for and the message fits the medium, in my experience) and they are both entirely public. I also like to tell myself that these pieces of content will also exist forever at their current addresses, so that people thousands of the years in the future will be able to read my scribblings. But the fact that it's “out there”–as I do a gesture that's vaguely like batting a fly away–has never really felt like it was built to last or that it had any gravitas.

And with this perceived slightness, I tend to treat things slightly.

But now these things are going in a book and a book is a Real Thing. It's what holds history, not just in libraries and bookstores, but in your grandparent's attic. It's also art, with illuminated works hundreds of years old behind glass in museums. And now I'm making one of those, too.

I don't have any visions of grandeur that what's in my book is going to interest anyone but myself, and it will likely never be in a museum, but it's an assemblage of atoms that's going to be on my bookshelf, and will probably outlast me. That's a completely different feeling than pushing some bits from my phone or laptop to some hard drive in a distant data center that I then have to use a computer to look at again.

And unlike a computer, a book is a self-contained thing. Where my browser will load everything from pictures my friends upload, to emails sent to my landlord, to the daily news, a book is made of its pages, and that's it. When I pick up a novel, I know I'm getting an unbroken experience of reading just one story. So I'm taking what would normally live beside everything the Internet has to offer, and putting it in its own chunk of dedicated space.

Something I didn't realize until I started to see the parts go together in the book is that it's also a much better attempt at a truer self-portrait. None of these services by themselves gives a clear view of me as a person, but once you start piecing them together, the previously fuzzy image starts to firm up a little. It's exactly like the blind men and the elephant, each trying to determine the whole by the description of its parts. I'm putting the pieces together, and, even though it's still a bit dim, you can start to make out details in the silhouette.

So, this book project has changed the way I look at these bits of content I've scattered around the web. None of those pieces have to be transient, and actually collecting them all and putting them side-by-side actually offers a bit of gravitas that separately they might not have.

A Well-Designed Post

Yesterday, I toyed around with the idea of turning my entire blog into a nicely-formatted book, in both PDF and epub format. This would probably interest no one but myself, but I’ve seen other people turn these online words into a feel-it-with-your-hands book, and to me, seeing something physical from the effort seems to add more weight to the work, but figuratively and literally.

I wanted Latex output, so I installed pandoc (which might be one of the most magical pieces of software I’ve ever used) and ran it across the Jekyll-created HTML. I then started the yak-shaving to clean up the headers and footers and download the remote linked images. The part I enjoyed most about using pandoc is that it automatically did a lot of the spell-casting that resulted in nicely-sized images that always gives me fits when dealing with Latex.

I then tried to work all these steps into an automated flow, but realized that beyond putting files in their appropriate places, I actually enjoyed the manual “typesetting” that Latex needs so I tried to stop engineering things for once.

Once I actually saw the some hundred-odd paged book, I had a few more ideas of things that might neat to see in print from the past year and thus don’t have anything to show yet for the book efforts, but I did want to just put up a sample of what one post looks like.

Here’s my Thanksgiving post from last week. There’s something odd about seeing what was originally web content in a more typographic form, and it will be even more odd seeing it in print (if it progresses that far).

I plan on documenting the whole process as I’d like to see other try this out.