We need new data books - so we published one
By Dave Fowler on 10/26/21
Two years ago we wrote We need new data books - so we started one kicking off the start of a book to aggregate the modern, agile data best practices we've learned through working with thousands of companies over the past decade. The response was phenomenal with a great discussion on Hacker News, and now over 15,000 downloads of that first e-book edition.
We've been working hard since that starter version, collecting a lot of feedback and input from industry experts and working data professionals. We're excited to announce that we've greatly improved and almost tripled the content, and have published with Wiley available today!
Why we wrote it
In my work at Chartio (now part of Atlassian), I get to meet many people who work with data every day. One of my favorite questions to ask them is, “Where did you learn to work with data?”
Surprisingly, most people tell me they’re completely self-taught and have “just figured it out.” As a follow-up, I ask what sources they’ve relied on, and the answers are all over the map. Mostly they’ll cite Google, StackOverflow, blogs, and sometimes they'll mention these books:
These books were very good for their time, and became classics. But in the fast-moving world of data, they’re ancient. Both were written before Redshift and the gains of the cloud C-Store warehouse. Back then, data was at a totally different scale, had very different costs, was used with totally different products, and was handled by people with very different training—primarily just at enterprise companies.
It has gotten to the point where pointing people to these books can do more harm than good. While there are an increasing number of blogs, newsletters and now even conferences on modern data, none of it quite pulls everything together in the way a book does -- and until there are new books the old ones will continue to be utilized.
Book overview
We wrote this book for anyone who values data and believes that a well Informed Company is more competitive. It’s a book for the working professional creating a practical, modern data stack that can make their knowledge workers knowledgeable, so they can win. A marketing team that knows which campaigns are working and which aren't, a product team that knows which features go unused, a support team that can find the information they need to quickly help a customer - these teams thrive and win over their less informed competitors.
Through working with thousands of customers at Chartio we've found that companies go through four main stages of Agile Data, and they are very much tied to what the data stack looks like at each stage.
We call the stages Source, Lake, Warehouse and Mart.
The book is organized around these 4 stages, and you can read more about them in our sample intro chapter: The 4 Stages of Agile Data Organization
What's new?
For those of you who've read the starter ebook version, you'll notice that this new edition is still organized around the same four stages, but is an almost complete rewrite with three times the content. There are also a few new sections, most notably:
-
Data Modeling Practices - There's a full chapter with a number of examples, style guides, steps and best practices for data modeling. Someone should probably write a new full book solely on data modeling, but until then I believe this is the most in-depth data modeling teachings published in the past decade.
-
Foreword by Tristan Handy - As the CEO of dbt Labs and longtime publisher of The Analytics Engineering Roundup there's no one more connected or well written on modern data. Years ago, I asked Tristan to write the foreword for the book. He happily agreed, and even though dbt Labs is on a crazy trajectory he still found the time to write an inspiring foreword.
-
What's changed - There is also a final section in the book on What's changed, where we've tried to put all the explanations of what's changed in data over the past decade, like ETL -> ELT and C-store warehouses, etc. We felt it was important to explain changes in the best practices but with our main intended audience being people who are newer to data, we didn't want to clutter the book with these explanations spread out along the way.
You can see more in the Table of contents.
Thank you's
There are a lot of people to thank in the writing of this book. First of all we want to thank all of the great people we've met over the years who've shared your data stories - either as customers, prospects, partners or thought leaders. Each of you have contributed your experience to this work and we are incredibly grateful.
Thank you to Mila Page (Developer Relations @ dbt Labs), and Emilie Schario (Data-Strategist-in-Residence at Amplify Partners) and David Yerrington (Data Science Consultant and Educator) for their great edits and writing additions to the book.
And of course we couldn't have done this without our great team at Chartio (now Atlassian) who had such a big part in implementing, testing and teaching all of these practices over the years for ourselves and for our customers. Special shout outs to Tracy Chow (Sr Data Support Engineer) and Brian Hartsock (VP of Engineering) for all the incredible data modeling work and iterations.
We hope this book helps you and your company be more informed, and ultimately more successful. We'd love to hear your feedback - do drop us a note anytime.