In 5 years, I've seen many data platform project becoming catastrophic failures. Below, you will find two errors easy to identify and costless to fix.
1- Technology patchwork 🧩
Many companies focus too much on what’s the best solution according to Gartner or Forrester for each function of their data platform. These companies end up with a patchwork of specific tools which doesn’t fit together and are difficult to operate and to maintain up to date.
Trifacta + Collibra + AWS EC2 + Docker + AWS S3 + Delta Lake + Luigi + Azure Active Directory + Confluent + Tableau Server and Power Bi - because why not -...💣 Do you really think it’s a good idea? Mismanagement, configuration overhead, and heavy governance are inevitable in this situation.
Companies don’t need the best in class for every function of their data platform. They need something managed to reduce operating cost, easy to maintain, and that doesn’t break when you update one part of the ecosystem. Tools interoperability is critical for a functioning data platform. Don’t underestimate it!
You should look at your primary provider’s competition only if it is a low tier for a very specific and critical feature needed by your data platform. Otherwise, go full-stack with the same provider and hit the road 🛣
2- Worrying too much about cost predictability 💸
Working in the cloud ☁️ is often a leap of faith for companies. It costs a lot and their team doesn’t master the technology stack involved in the process - yet -.
In this context, the reassuring reflex is to control the costs and to ask project owners to bring a dully documented expenditure plan for any initiative on the brand new data platform before being able to start.
This is madness! 🤯
When you go to the public cloud, rewire your brain to think about instance management. Make sure you’re using the right resource with the right pricing plan (as you go, spot, reserved...). Once that’s done, you can assume that you’ll pay for what you use. Nothing less, nothing more. Let the project owners run a few iterations, ask them to monitor the costs. After that, they will build a detailed expenditure plan with great accuracy.
A week of hard work on a classical computer vision project cost between 400€/week and 2000€/weej. The first iterations requisite to build the accurate expenditure plan won’t wreck your company’s budget apart. Let loose!
Hell is paved with good intentions. Often, the primary reason for your data platform failure is the way you think and the way you govern it. Detect these destructive behaviors and milk every bit of potential in your data.