I was trying to find a baby needs detection model that I could use to detect different baby needs, such as diaper change, feeding, or sleeping. But I couldn’t find any. So I decided to build my own.

My design is simple. I have a Nest Cam can be used to capture baby images. I just need to integrate with Nest Cam API to get the images. Then label the images with the correct needs myself, which will be used to train a model. Finally, deploy the model so that I can use it to detect baby needs.

The first part looks easy at first. But when I started to look into it, I found it’s not that easy. There is no snapshot API anymore for the new Nest Cam. The only way to get the images is to use WebRTC. I almost got it working using golang and Pion WebRTC. The WebRTC connection can be established, But I stuck at the taking snapshot from the video stream step.

Then I stepped back and redesigned my solution. I’m inspired by the Device Access sample. So I decided to try to use Firebase to build a web app for capturing images, labeling and predict.

The web app is generated by create-react-app using redux-typescript template. Most firebase related integrations are done by reactfire. The flow is to link the Nest Cam to the web app, then streaming video via WebRTC. The video stream is captured and saved to Firebase Storage with metadata stored in Firestore. Then I can use the web app to label the images and train the model.

But before that I need to find a way to continuously capture the snapshots. I can open the browser on my machine, but I don’t want to keep it always open. The first idea came to my mind is to use a Raspberry Pi to open it. But my model is too old, can’t open my web app. So I had to find another way.

The obvious solution is to use a headless browser, so it can run inside a container, then I can deploy it to a server. Puppeteer is a good choice. The official docker image even has a built-in Google Chrome, which is needed for Nest Cam video stream, since it’s using H.264 codec, not available in Chromium. So I built a feeder that login then enable taking snapshot every 30s. The free compute instance on GCP is enough for my use case. I’m using CoreOS as the OS, which is suitable for running containers.

Now I have steady snapshots, I can start to label the images and train the model. After I spent some time labeling the images, I accumulated 2000+ labeled images. All data can be queried from BigQuery as I added Stream Collections to BigQuery extension. I used Vertex AI at first, training and deploying the model is easy, the only problem is the cost. It doesn’t have a free tier usage like AutoML Vision. So I switched to AutoML Vision even there are many prompts to upgrade to Vertex AI. Training experience is similar, but a little inconvenient as I have to use gsutil to copy images to a us-central1 bucket.

After the model is trained and deployed as an API, I can use it to detect baby needs. I’m using Cloud Functions V2 (Cloud Run) to send predict request whenever a new image is saved to Firebase Storage. Then save the result to related Firestore document.

The predictions were going well for a while, then I found the online prediction free usage is running out. So I switched to edge training, then export the model as TensorFlow.js. @tensorflow/tfjs-automl has good support for this. The only issue is need to remember call dispose() to release the memory of tensors.

After all above steps done, every 30s a new snapshot of my baby is captured and classified to different labels. Currently I’m just using 3 (awake, sleep, nobody), I’ll try to expand the list later.

I can see the new labels in web dashboard, but I also want to be notified when the labels are changed. In a normal case, I’ll integrate with a notification service like PagerDuty, but it will be overkill for my use case. The Google Cloud app can acted as a notification channel. So I just created a log based metric from cloud function logs. The structured log in cloud function is really handy, I can just log something like this:

log('Prediction result', { label: 'awake' });

The metric label can be extracted using jsonPayload.label. Then I can create some alert policies like awake detected or metric absence for several minutes.

The bonus of this solution is that I can just use the free tier of GCP and firebase to cover almost all my needs.

The source is available at hemslo/aidan. Feel free to use it and customize it for your own use cases.