Tweet IDs About To Get Jumbled In A Blizzard As Snowflake Is Set To Roll Live
Since the beginning of Twitter time, the company has used sequential IDs for tweets. This has helped some third party services come up with rough estimates as to the total number of tweets there were on Twitter (though it was always a bit inaccurate since they’ve done huge jumps a few times). But starting today, that will change as Twitter’s new status ID system, Snowflake, is in the process of rolling live.
Twitter Developer Advocate Matt Harris reminded developers of the change late last night in a Google Group posting. The roll-out was scheduled for 10 AM PT today, but an update from the TwitterAPI account notes that “Snowflake is on ice for the moment” and promises an update soon.
So what exactly does Snowflake mean for people? Twitter had a post back in June on their engineering blog explaining that as they moved away from MySQL to Cassandra, they needed to come up with a new ID system since Cassandra has no built-in unique ID generator. So Twitter dreamed up Snowflake, an ID system based on timestamps rather than sequential numbering. As Harris explained last month:
Snowflake still uses 64-bit unsigned integers but instead of being sequential they will instead be based on time and composed of: a timestamp, a worker number and a sequence number. For the majority of you this change will go unnoticed and your applications will continue to function without the need for any changes.
But Harris also noted that Snowflake meant tweet IDs would no longer be useful for data analysis. Tweet numbers will still be roughly sortable (as long as two tweets are less than one second apart), but you won’t be able to tell how many tweets apart they were.
Harris’ key points for the Snowflake change:
- Status IDs will be unique
- Status IDs will continue to increase – Tweets created later in the day will have a higher ID that those created in the morning
- Order will be maintained for Tweets allowing you to sort by Status ID. The accuracy of the sort will be to approximately 1 second, meaning Tweets created within a second of each other have no order.
- All existing API methods will continue to work the same as before
- Previous status IDs will be unchanged
- There will be a noticeable jump in the numerical value of status IDs when we change.
As a sidenote, you may have noticed that with the rollout of New Twitter, the service began inserting “#!” into every tweet URL. This is simply to ensure that the pages will get indexed by Google with the new (AJAX) system in place — otherwise the tweet ID system would be ignored.