It happened again.
I grossly underestimated integrating a third party SDK into our iOS app based on Kotlin/native. And by grossly I mean not just 2x or 5x, but an order of magnitude. I’m not proud of it, but it’s just what it is and it’s certainly worth a blog post as I learned one or two (actually 4, feel free to skip to the last section if you are in a hurry) things from it — and I guess I just need to write this down to finally start looking at it not just as a gigantic waste of time, but also as a lesson in patience and humility.
First of all, technically I lied in the intro: I underestimated the effort by *at least* an order of magnitude, as I am still not done yet. After two weeks. Not working straight on it, but on and off, which is kind of worse as it really created a lot of context switching overhead. That being said, let’s start from the beginning.
It was just another task on just another day…
OneSignal is a push and in-app messaging SDK for mobile apps. It helps facilitating communication with your user base and as such is quite important for many mobile apps.
OneSignal integration into our (not Kotlin/native based, but just plain Kotlin) Android app was easy peasy. Just as expected I was done in a day including extensive testing.
So I expected around two days for iOS integration, as firstly, I have nowhere as much iOS experience as Android experience and secondly, our iOS app is based on Kotlin/native which potentially could complicate things. However, I didn’t expect much trouble from the latter as the OneSignal SDK was supposed to just touch the Swift side of things, so what could go wrong?
Well, turns out, hell of a lot!
The OneSignal SDK does something clever when you integrate it into your app (no matter whether you are using Swift Package Manager or Cocoapods). Without writing a single line of code its existence as a dependency alone will trigger a process called swizzling on app start. I am not super proficient in iOS development — I am more on a _can-make-things-work_ level, but here is what I understood swizzling does at a very high level: at app start (so at runtime, not at compile time) swizzling will exchange/add implementations of functions. This is usually done so that developers don’t need to implement certain calls to the SDK on their own and in the process make mistakes by calling the wrong functions at the wrong time.
In OneSignal’s case they are swizzling push notification handling capabilities which should make my life as an engineer easier. Should. The problem is, the way they are doing swizzling changed initialization/loading order of certain classes — something the Kotlin/native code at the other end of the food chain didn’t expect!
What the swizzle?
My first thought was: why does the Kotlin/native code even bother? It’s all Swift stuff being touched by OneSignal, or is it? Turns out, it’s not. Swizzling causes virtually all classes in your binary to be touched — that’s just how it works from what I understood from the more advanced documentation and posts you can find online. It seems to follow the visitor pattern: visit all classes and functions and then ask whether the implementation should be changed. If all classes are touched, so are the Kotlin/native classes.
And that’s the culprit. Right at app start, the app crashed with a SIGABRT (feel free to ignore the trace, it’s just here for the show effect):
Yes, that’s what I, with my limited iOS knowledge, got to see after launching my app. Quite of a mouthful, huh? There are better ways to start your day, believe me.
It took me some time to figure out what was happening — although that’s actually an exaggeration. I didn’t really know what was going on, just that swizzling somehow seemed to interfere with some Kotlin/native magic.
Road to redemption?
So I did what every responsible engineer does: fire up Google and search for what I thought was causing the problem. In lieu of meaningful search results (Kotlin/native still isn’t as widely employed as it could potentially be) I then reached out to the OneSignal and Kotlin/native developers.
If anyone is interested, here are the bug reports:
Crash at app start with Kotlin/native; objc_getClassList causing initialize to fire before load on…
Description: Our app is based on Kotlin/native and after integration of the SDK as per the documentation we are seeing…
Luckily, developers of both OneSignal and Kotlin turned out to be very responsive, kudos! And the Kotlin engineer I talked to even came up with a quick to implement workaround, after I had managed to create a sample project for him to replicate the issue: I had used CocoaPods to integrate the OneSignal SDK, he asked me to use SwiftPM instead and at the same time make the Kotlin/native shared code module a dynamic framework.
And it worked! As you might expect I was overjoyed! And it got even better, the Kotlin native engineer already had implemented a fix for the issue which is scheduled to be released in about 3 months as part of the next Kotlin beta.
Actually, I would have preferred to stay with CocoaPods and the static framework and instead disable swizzling for the OneSignal SDK, but it’s not clear yet, whether the latter is possible — the cost of taking away much of the integration work from user-engineers is usually paid by the same engineers when their setup is not as expected. ¯\_(ツ)_/¯
At that point, I had spent about 2 weeks on the problem. Phew.
Time to merge. I created the PR and our CI started to do it’s thing. Part of the _thing_ is to create and export an archive for internal testing purposes. And guess what? Yup, exporting failed.
shared not found in dylib search path
By changing the shared Kotlin/native module to be a dynamic instead of a static framework I screwed up linking big time! At the time of writing this issue has not yet been solved despite two days of try and error (once more violating learning #3, see below). I’ll update the post when we have a solution for that problem.
There finally is a solution, and it consists of fixes introduced to the latest versions of Kotlin and OneSignal. Big thanks to the amazing teams over there for going the extra mile here and fixing this awkward issue! 👏
- Never underestimate the integration costs of a third party SDK, especially if it promises to be integrated “in less than 10 lines of code” (quoting the OneSignal landing page).
- Never underestimate the added costs when employing a growing, but not yet established technology (such as Kotlin/native in my case). However, to be fair, this was the first time I stumbled on such an issue with Kotlin/native, never had any similar problems over 2 years of development, so it also seems to be a special case.
- Never overestimate your own skills and instead ask for help, when you need it. It took me three days until I reached out to the OneSignal and Kotlin engineers. Before I spent way to much time trying to debug things I clearly had no real idea of. That doesn’t mean one should bombard dev-support or StackOverflow without spending some time to think about a solution on your own. But you should foster the mental clarity to see when you need help. And when you reach out to your fellow developers: ensure that your bug report adheres to their project’s guidelines and provide all the information needed in a concise and complete manner. It’s just a matter of etiquette and respect of their time.
- Never underestimate the time it takes to create a sample project. Here, it took me over two days to finally boil it all down to the essentials and making sure that the issue is still reproducible.
Now, any of those learnings should be new to someone who spent more than two decades on coding (oy, I’m getting old…). But it helps to reiterate them from time to time to overcome the hubris that sometimes comes with growing experience.
I will probably end up to be the person reading this post the most — it shall serve as a reminder for my future self, but might also help you, dearest reader, to avoid the estimation traps I fell into.