Dedicated to my princess <3.
The motivation behind this project came from my girlfriend. She recently moved away to go to Law School which kinda sucks. Now that we're physically far away, it got me thinking about ways to make her feel closer to me even though we're thousands of kilometres apart.
My dad has a security camera feed in our old home in China which he sometimes uses to talk to my grandparents which got me thinking what if I could do something like that for her. You might think that having a stalker cam is weird like "I don't to be in a surveillance state", or "I don't wanna be observed like a lab rat", or "I want my privacy" which I totally understand but I'm me and I thought it would make us feel closer, which I think it has.
One of the limitations to my dad's camera setup is that it gives the remote user no mobility which I thought was restricting because anyone who's video called before knows that feeling of being trapped because you can't really look around freely. Another thing was video quality, the security camera feed was choppy and low quality which made it a really bad viewing experience and it also adds to that surveillance feeling. I thought if I provided a video feed that was high quality I could minimize that security cam, stalkerish feeling. My solution to quality and mobility was to use my existing Sony a6400 and a motorised pan head called the ZEAPON PONS PT Pan Head.
I'd mount everything onto a tripod and, through some mechanism, allow the user to control the pan head remotely and securely.
The 2 axis Pan Head is actually two standalone Bluetooth Units which makes the Pan Head modular in some regard. This also makes it easier decode Bluetooth messages as one bluetooth device will be responsibe for the pan and the other device responsible to the tilt which limits the scope of functionality that one single Bluetooth device would have to support. When you think about a 2 axis/ 2 unit Pan Head like this there are only two things that the Pan Head needs to know for it's functionality - speed and direction.
The Pan Head comes with a phone app that controls the panning of the camera, the problem with this is that I'd need to find a way to hijack this signal and be able to control it from a server control unit.
To do this I first installed the app onto my Pixel 7, unlocked the Bluetooth HCI Snoop Logs, and just started sending commands to the Pan Head. Once I was done I got the logs loaded into my computer and started to analyse the signal through WireShark. The goal here was to find the my phone as the source and the Pan Head as the destination with the Bluetooth OpCode that tells us that it is a Write Command. We can then analyse the value and understand the instructions given too the Bluetooth Pan Head.
My method here was to just connect to the Pan Head on my computer and just start sending out a bunch of commands which worked. Through pure trial and error, I was able to isolate parts of the message that controlled the direction and speed which were the two parameters that I needed.
The breakdown of the message is as follows:
{header}{command_direction}{action_modifier}{command_speed}{additional_parameters}
header = '02' (constant)
command_direction = '00', '01' (right), '02' (left)
action_modifier = '000' (constant)
additional_parameters = '00000000000000' (constant)
command_speed = range of (000 - 999)
So this value describe in the photo:
02
02
000
436
00000000000000
Will mean that the connected unit will turn left (02) at a speed of 436.
After figuring that out it is fairly simply to use a library to connect and interact with Bluetooth Low Energy (BLE) devices.
In a system like this latency is the key concern. You can't have remote control system if the latency sucks ass, it'll just be miserable for the user. I ended up finding AWS's Interactive Video Service (IVS) that seem to tick all the boxes. IVS offers good latency with their Real-Time Streaming option which has the latency like that of a Zoom Call, it also preserved quality well with their own backend compressions which ticked my second box regarding video quality.
You might be concerned about the cost but this is also not too much of a worry since Amazon charges per participant hour with each participant paying $0.0720/hour, it was fine because at max there would only be two people in this session.
I chose Flask as our web framework because it's lightweight and easy to use for building RESTful APIs. This allowed me to define endpoints that clients can call to send commands to the devices. To communicate with the Bluetooth devices I used Bleak, a Python package designed to communicate with BLE (Bluetooth Low Energy) devices and perform BLE read/write operations it also supports async programming which is critical for responsiveness. To make our server responsive, we integrated Gevent, which enables asynchronous processing. This means while one request is being handled, others can be processed simultaneously, making the server snappy and efficient. Additionally, I added retry logic to handle transient errors and implemented health checks to automatically reconnect to devices if they become disconnected to make the system more robust and fault tolerant. This server essentially handles the video token gen through AWS's boto3 SDK and processes the request to control the pan head.
Once the server is up locally we need a way to open the server to the rest of the internet and for this I used Telebit. Telebit is a open source, service that provides a way to expose local servers or applications to the internet securely and easily. It acts as a tunneling solution, allowing developers to share their local development environments or services without needing to configure complex networking setups, such as port forwarding or dynamic DNS. It's a great free alternative to services like ngrok and its secure tunneling, custom domains and easy setup really makes life a lot easier on the networking side.
After I while I noticed some pretty significant processing lag from the server. Although Flask was an easy to use web framework was very slow due to it's python heritage. I did try to switch over to FastAPI but that ran into similar issues. Python is also a language that tends to consume a lot of resources which is a concern for my very compute limited RaspberryPi. There is a great video comparison between different web frameworks that can be found here. So after much deliberation I decided to rewrite the whole thing with nodeJS and ExpressJS.
There was also a problem with the nature of RestAPI requests for this project is that the protocol is inherently stateless which creates significantly more overhead per request. For real-time applications such as this one, it is much faster to use WebSockets instead due to
it's ongoing stateful communication channel that opens a connection once and then keeps it open for as long as needed, significantly reducing the time it takes to send a message.
To expose my RaspberryPi to the internet I use a Cloudflare tunnel to securely expose my RaspberryPi server to the internet. I also set a policy to only allow requests from my Frontend application which is also secured behind a protected site.
I know Kubernetes is a bit overkill for a setup like this as it doesn't require the scaling demands that other applications do. But Kubernetes provides some great features that is relatively easy to setup which allow for relatively straight-forward implementation.
First we need to create a container for the application we install all the dependencies we need for the server to run along with all the bluetooth dependencies that the server requires.
We then need to setup the Kubernetes Deployment here with 1 replica because we really just want to maintain that persistent connection between the program and bluetooth devices and we don't want to run into any weird race conditions or connection conflicts or things of that nature.
We then setup the CloudFlare tunnel container and the Ingress to forward traffic onto our webserver I will then set the network policy to only allow traffic from certain sites at the Ingress level.
The traffic flow looks something like this:
Internet → Cloudflare → Cloudflare Tunnel → Ingress Controller → Ingress → Service → Pod
The classic LGTM (Loki, Grafana, Tempo, Mimir/Prometheus) (Looks good to me) stack can be implemented in Kubernetes using Helm charts, typically deployed in a monitoring namespace. Prometheus collects metrics from your applications and Kubernetes components using ServiceMonitors and PodMonitors. Loki aggregates logs by deploying DaemonSets of promtail or vector agents on each node to collect container logs. Tempo handles distributed tracing by receiving trace data from your applications (instrumented with OpenTelemetry or similar). Grafana ties it all together as the visualization layer, connecting to these data sources to create unified dashboards and alerts. You can use operators like prometheus-operator and the grafana-operator to manage these components declaratively, while persistent volumes back the time-series databases and log storage.
The frontend for this project was just a straightforward WebApp that I built in Webflow. I really like Webflow because it makes HTML, CSS development just so visual and you can really see what is happening where. It saves a tonne of time because you don't have to write CSS lol.
We again had to use the Amazon web development kit which would connect the IVS RealTime Stage. We don't want the user to constantly have to copy paste tokens so we setup a token generation function on the backend that allows the users to join the click of a button.