Drive Magic – Working Around Google Drive APIs

Published on

January 30, 2018

Author

Balázs Németh

Software Engineer

Subscribe to our newsletter

Subscribe

This is the first post of my series about working around Google Drive APIs.

You can’t really exist on the web without ever encountering it. You are most likely an avid user if you are reading this.

Obviously, from the users’ point-of-view it is a great product. With the initial release that happened over 10 years ago — when the currently widely used Drive moniker was nowhere to be found as it was Docs, Sheets, and Slides — it made online collaboration widespread. Due to the tight integration between the various G Suite products nowadays the need for enterprise solutions that use — or actually build upon — it increased a lot. I was lucky — well that is questionable, and you will see later why 😉 — enough to work with it for a rather long time.

Well, long time seems a bit over exaggerated given that it’s been only 5 years a few months ago, but due to the age of the whole product, I still feel like a Methuselah. Looking back we created solutions I never imagined we will, but it came with our fair share of mindf*ck along the way.

I think Google was surprised as well how popular it became in just a few years. Sure, it certainly wasn’t the view it projected for the users and what you could see in the marketing bs — sorry material — for Google Apps — maiden name of G Suite -, but if you have ever tried to integrate with it you could certainly see the drawbacks and the indications that this wasn’t planned as an enterprise product with a scale like this. After I just wrote this sentence I already heard many of you cry out about how unjust I am with Drive — given that neither of the competitors is better. Well, I don’t think so I am. I just meant enterprise-level integration with a wide feature set and huge throughput, and preferably projects that have started years ago as the issues I’m about to describe resemble the rain forests. They tend to decrease over time. At least, in this case, it’s good news :).

Although I already worked with some older services (like Documents List API, Spreadsheets API), and some newer ones (like Drive API v3, Sheets API v4). Most of my experience comes from Drive API v2… and well let’s just forget the GData era. It’s better for everybody. I thought about ordering and prioritizing the issues, but I couldn’t come up with a decent enough rule for it. Basically, every issue could easily be just as crucial to an app as irrelevant to another one so I go as it feels right. Also, I’m sure it would have been heavily biased already, as I do remember every single hour that it caused me thinking about my career choices 😀 So excuse me in advance if I might ramble — or even rant -, but that is how I “cope”… it’s cheaper than drinking ;). … and please consider that many experiences I describe here happened before it was documented and/or fixed properly.

Docs labels rebranded as Drive folders

One of the probably most often needed feature that is missing is that you can’t search in a folder hierarchy through the API. You can search globally, or in a specific folder explicitly, but not inside subfolders. Although it’s present on the Drive UI — it has been recently released -, there is no API method for it.

It’s also most likely caused by the original underlying architecture. Back in the days the documents had labels instead of folders and weren’t actually represented as being in folders as they are today. It was similar to how Gmail still represents labels. When it was rebranded some functionality was added, but the core concept is pretty much the same.

Different API for UI

If you check your network activity while using the Drive UI you can spot many /drive/v2internal/ calls. That isn’t the published v2 as it contains more functionality. Unfortunately, many of them won’t get promoted to the public API — or at least very slowly.

Examples:

Previously mentioned “search in subfolder” feature.

Copy comments and suggestions in Google Docs, Sheets, and Slides.

Name versions of a Doc, Sheet or Slide.

Domain-wide sharing issues

If a file is shared with the domain, but the user has never opened it — which is expected for new accounts and/or new files for example -, then it won’t show up on the default result list when you search for it on Drive. You have to modify the “Location” from “Anywhere” to “Visible to anyone in yourdomain.com”. I can’t imagine a scenario where this makes sense business-wise — that it’s not included in default — apart from the possibility that it’s a direct effect of an architecture that hasn’t been originally built for this.

Noop actions triggering change notifications

It’s a rather frequent use-case to subscribe to aka “watch” change notifications. We certainly do it a lot. It’s also common to further modify the file you received notifications about.

Going on with our example imagine that you don’t always check the current state on Drive, or the content of the patch you just built — if it’s empty or not — you just execute the request. It happens. Developers are lazy. So what you did was essentially a noop change, yet we encountered multiple cases when requests like this triggered a new change notification on the very same file. Which we processed. Again. Executed a noop request. Again. I think you can see where this is going 🙂 This issue had an ever trickier occurrence. We wanted to actually change something, but we couldn’t. For the sake of the example imagine we try to modify the title and the parents in the same request, but we don’t have edit access on the folder we try to move the file to anymore. The request fails, which is actually expected. What wasn’t that it triggered a change notification, that ensured we tried to do this again. It required storing that we have already failed in an unsolvable way to avoid looping.

GetIdForEmail, and lack of GetEmailForId

There is no such a method on the Drive API as getEmailForId, and even getIdForEmail hasn’t been added to v3 and only present in v2. There must be some kind of reasoning behind this decision, but I can’t see it. It must be that they consider the id->email mapping as something that is a need-to-know, but how can that knowledge be exploited? It could be used to list emails for spamming, but given the length of ids, there are easy ways to avoid that without completely getting rid of a useful method. Meanwhile, this could make debugging much more complicated for example when an account gets modified to have a new email address.

That’s it for today, but don’t you worry. I’ll be back… with rate limiting, batching and so on.

Author

Balázs Németh

Software Engineer

Subscribe to our newsletter

Subscribe

New opportunities with cloud solutions!

‍Aliz is a proud Google Cloud Partner with specializations in Infrastructure, Data Analytics, Cloud Migration and Machine Learning. We deliver data analytics, machine learning, and infrastructure solutions, off the shelf, or custom-built on GCP using an agile, holistic approach.

Let's talk!