VST2 is a simple interface that is easy to understand, and it's somewhat extendable, but it was originally extremely incomplete. The specification struggled from the beginning to keep up with the zillions of implementations. It was often not clear how to do things, so different developers implemented things differently. It could definitely be saved as an API but it's moot now.
VST3 is a different interface model that is familiar to some programmers (anyone from the Windows COM world) but completely unfamiliar to others. It's a proven model, but if you're not familiar with it, it looks confusing and intimidating. I think that created a barrier to entry for a lot of developers who were comfortable with VST2. VST3 also suffers from overdesign, but imo most of the development friction comes from unfamiliarity with the model.
LV2 is a fine design imo but it's extendible to a fault, and there's development friction from the (well intentioned) separation of metadata and implementation, which means developers need to add a conceptually new step to their existing development process.
CLAP has the advantage of hindsight on the other formats and it's trying to be simple, extendable, and clear. The comparative advantage of CLAP is in the design goals of clarity and extendability. As with LV2, the sacrifice is in completeness, so once the format is stable it will be a challenge to keep new extensions well-defined, because it's so easy for developers to extend. But the design goals are laudable, and Alexandre is doing a good job of collaborating while keeping the project goals well-contained.
What I don't think is worth worrying about is externalities like adoption rate or whether some formats "beat" other formats. A good API will be used if it doesn't create too much development friction, and if enough developers know about it. Urs is definitely doing his best to let people know about it
