The problem
In a previous job as a programming teacher, I had the now-standard setup of a laptop plus external display. During meetings, to maximize visibility, I would use Zoom’s dual monitor feature to place the window with people’s videos on one display, and on the other display I would place the window with whatever was being presented. It worked pretty well, except for one thing: it often looked like I was not paying attention to the meeting.
In addition to having two monitors, I also had two cameras: the laptop one, plus an external one. But no matter which camera I used or where I placed them, I would eventually switch attention between the displays, making it seem like I was looking away from the meeting.
That’s when I thought it would be nice to have a feature where the active camera would automatically switch to the one I was facing. That way, I could use the laptop camera and display when I was looking at them, and the same for the external ones. Not that it would necessarily solve the problem (or even that the problem needs solving), but I thought it was an interesting problem to tackle.
After a lot of learning and tinkering (kernel modules, video codecs and color spaces, face detection algorithms, etc), I was able to make it work. It was a really fun project, so I ended up using the process and the code to teach a number of my students about some of the interesting issues and technical aspects involved.
It’s been a while since I last gave a lecture on this, so I decided to write it up and explain it in some detail, so hopefully someone can learn something from it (or at least have fun reading).
About the code
Below I’ll go into details about the code and how it works, but you can find the complete version in this repository.
It’s worth noting that, although the code has some optimization, it was not created for real-life use, as it consumes quite a bit of resources, especially with higher resolutions. Rather, it is structured in a way to make it easier to teach, as it abstracts away the steps/features into different files and classes with that in mind.
What we’ll need
For this project, we’ll need two things:
Python
language to write our code;OpenCV
library to get data from cameras, but also to detect which one has someone facing it;
I’m assuming you have Python installed, or are able to set it up, I’m using Python 3.12.
Also, this project was done entirely on a Linux setup, but I imagine it would take just some adjustments to make it work on Mac or Windows, considering that OpenCV is cross-platform.
Where are our cameras on Linux?
The first thing we need to do is to find out the path to our camera devices, so we can open them. On Linux, video devices are controlled by the v4l2
kernel module (Video for Linux v2).
To list available devices, we can run this command on a terminal:
v4l2-ctl --list-devices
The output will be something like:
HD Pro Webcam C920 (usb-0000:00:14.0-1.4):
/dev/video2
/dev/video3
/dev/media1
Integrated_Webcam_HD: Integrate (usb-0000:00:14.0-5):
/dev/video0
/dev/video1
/dev/media0
It’s common for each camera to have multiple devices. The actual video feed is usually the first one in each camera, the rest are for metadata or other internal usages. So, in our case we will use /dev/video0
for the integrated camera, and /dev/video2
for the external one.
Getting video from a camera using Python and OpenCV
OpenCV is a computer vision library that is open source and covers a lot of different use cases, from basic image processing, to face detection, to even deep learning. It is written in C++, but it has bindings to be used with other languages such as Python and Java.
We can install it using pip:
python3 -m pip install opencv-python
Then we can use it to open the camera and create a preview window to see the video feed. Below is the first version of our main file, with some comments on how it works:
# main.py
import cv2 # import OpenCV's module
# aux function to check if our preview window is closed
def window_closed(window_title):
try:
window_closed = not cv2.getWindowProperty(
window_title, cv2.WND_PROP_VISIBLE)
except cv2.error:
window_closed = False
return window_closed
# some helpful constants
WINDOW_TITLE = "Preview"
ESC = 27
def main():
# camera device path, which we found out previously
device = "/dev/video2"
# OpenCV's VideoCapture class is how we'll get the video data
cam = cv2.VideoCapture(device)
if not cam.isOpened():
raise SystemExit(f"Unable to open {device}")
# setup some configs
# - lower resolution demand fewer resources
# - buffer size at 1 frame, to process it in real time and avoid lag
width, height = 640, 480
cam.set(cv2.CAP_PROP_BUFFERSIZE, 1)
cam.set(cv2.CAP_PROP_FRAME_WIDTH, width)
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
key_pressed = 0
# keep running until we close window or press ESC
while not window_closed(WINDOW_TITLE) and key_pressed != ESC:
# read() captures a frame from the camera, if available
has_frame, cam_img = cam.read()
# note for the nitpicky: we could use a guard if instead, but
# this will make more sense later, especially with the keypress
if has_frame:
# show the captured frame
cv2.imshow(WINDOW_TITLE, cam_img)
# check for pressed keys
key_pressed = cv2.waitKey(1)
# release resources
cv2.destroyAllWindows()
cam.release()
if __name__ == "__main__":
print("Running...")
main()
Repository code: main_01_previewcamera.py.
The code is pretty straightforward, but the highlights are:
- OpenCV’s
VideoCapture
class is how we open a video device to get its data; - we need to configure it (buffer size, resolution) to get good results;
- the
read()
method captures a frame from the device, which we can then show on our preview window.
We can run the code with python3 main.py
and hopefully we’ll get a window showing the camera feed in real time:
Description: video demonstration of real-time camera feed.
A note on the resolution: I’m using 640x480 as I found it to be a good balance between quality and performance. However, it would be possible to use higher resolutions, especially if the camera supports faster compressed codecs, and if we optimize the code for performance. Perhaps this would be a good post to write in the future.
Detecting faces in real time
To detect faces, we’re gonna use OpenCV’s CascadeClassifier
class. It works by applying a classifier in stages, quickly rejecting cases that are not of interest (in our case, regions of the image without faces), and focusing on the ones that are of interest.
In particular, we’re gonna use it with what are called Haar features, which are basically patterns of pixel intensities. These patterns, in the aggregate, allow us to detect all kinds of objects. If you want to understand more in-depth about how that works, you can check out this Official OpenCV Tutorial. There is also a great series of lectures by Shree K. Nayar from the Columbia University, and he talks about face detection with Haar features, starting with this video.
As I said, Haar cascade classifiers can be used to detect all kinds of things, but it needs to be trained for each case. Luckily for us, OpenCV releases a number of pre-trained classifiers, including ones for faces. You can find them in their repository here.
To make things easier, I’ve put a copy of the one we’ll use in my repository: haarcascade_frontalface_default.xml. This is the default for detecting frontal faces, but there are alternative ones. If you want, you can play around with them, or even with detecting other objects.
Now that we have that in place, let’s simplify things by abstracting the camera opening code into a subclass of VideoCapture
. Plus, it will be useful later for handling multiple cameras:
#facedetection/camera.py
import cv2
class Camera(cv2.VideoCapture):
def __init__(self, device, resolution):
super().__init__(device)
if not self.isOpened():
raise SystemExit(f"Unable to open {device}")
width, height = resolution
self.set(cv2.CAP_PROP_BUFFERSIZE, 1)
self.set(cv2.CAP_PROP_FRAME_WIDTH, width)
self.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
Repository code: facedetection/camera.py
Now we update our main code:
# main.py
# ... other imports omitted ...
from facedetection.camera import Camera
# ... rest of code omitted ...
def main():
cam = Camera(device="/dev/video2", resolution=(640, 480))
# instantiate our frontal face detector
detector = cv2.CascadeClassifier(
'facedetection/haarcascade_frontalface_default.xml')
key_pressed = 0
while not window_closed(WINDOW_TITLE) and key_pressed != ESC:
has_frame, cam_img = cam.read()
if has_frame:
# try to detect faces at different scales, see explanation below
results = detector.detectMultiScale(cam_img)
# for each result, draw a rectangle around it
for dims in results:
x, y, w, h = dims
cv2.rectangle(cam_img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow(WINDOW_TITLE, cam_img)
key_pressed = cv2.waitKey(1)
cv2.destroyAllWindows()
cam.release()
# ... rest of code omitted ...
Repository code: main_02_facedetection.py.
The face detection itself is done with the detector.detectMultiScale(cam_img)
call. This method automatically tries to detect faces at multiple scales, as the size of the face relative to the size of the image changes depending on how far the person is from the camera.
We can run the code again with python3 main.py
, and we’ll see the face detection working:
Description: video demonstration of real-time face detection.
Making it faster
As fast as the Haar features algorithm is, detecting faces in real time can consume a lot of resources. So I decided to test two possible optimizations: reducing the image resolution and changing the image to grayscale.
The idea is that by reducing the amount of information to process (either in resolution or in colors) we could improve performance. The question is whether the extra time necessary to make the transformations (resolution or color) is worth the gain in time detecting faces.
I tried all 4 combinations (2 of resolutions time 2 of colors) to see if there was a difference. I’m not gonna go into the benchmark code, but you can see it in the repository if you’re interested: main_03_benchmark.py .
The results in my computer were these:
Running...
Testing FaceDetector...
Testing FaceDetectorReduced...
Testing FaceDetectorGray...
Testing FaceDetectorReducedGray...
Done.
Results:
0.4417630029929569 seconds - FaceDetectorReduced
0.504424922997714 seconds - FaceDetectorReducedGray
3.499028164005722 seconds - FaceDetectorGray
3.7348614289949182 seconds - FaceDetector
As you can see, reducing the resolution improved the performance significantly. However, it doesn’t seem worth it to also change the color to grayscale (although it did improve a little by itself). For the resolution reduction, I used a scale factor of 0.25
(25% of the original resolution), but we could also use a fixed resolution if we wanted.
Note: after writing this I did some digging, and I found out that OpenCV already converts to grayscale if you pass an image with more than 1 color channel (relevant code here). However, in my benchmark, just converting to grayscale (without reducing resolution) seems to always outperform not doing it (even running the whole thing multiple times). But I have no idea why on earth my conversion to grayscale would be faster than the native one by OpenCV. Just consistent random noise? Unlikely. If you have a clue, let me know.
From that, I wrote this optimized face detection class that I’ll use from now on:
# facedetection/facedetection.py
import cv2
BLUE = (255, 0, 0)
GREEN = (0, 255, 0)
RED = (0, 0, 255)
class FaceDetector:
def __init__(self, resize_factor=0.25):
self.resize_factor = resize_factor
self.frontal = cv2.CascadeClassifier(
"facedetection/haarcascade_frontalface_default.xml"
)
def _draw_rects(self, img, rects, color):
for dims in rects:
dims = (dims / self.resize_factor).round().astype(int)
x, y, w, h = dims
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
def _detect(self, img):
# Reduce image resolution
reduced = cv2.resize(
img,
None,
fx=self.resize_factor,
fy=self.resize_factor,
interpolation=cv2.INTER_NEAREST,
)
result = self.frontal.detectMultiScale3(
reduced, scaleFactor=1.1, minNeighbors=3, outputRejectLevels=True
)
return result
def find_faces(self, img, color=GREEN):
result = self._detect(img)
self._draw_rects(img, result[0], color)
return result
Repository code: facedetection/facedetection.py.
Switching cameras
The next step is to handle multiple cameras at the same time, and to be able to switch between them.
Let’s begin by creating a class to handle multiple cameras at the same time:
# facedetection/multicamera.py
from .camera import Camera
class MultiCamera:
def __init__(self, devices, resolution):
self.devices = devices
self.resolution = resolution
self.cams = [
Camera(device, self.resolution) for device in self.devices
]
def __len__(self):
return len(self.cams)
def read(self, index):
return self.cams[index].read()
def flush(self):
# Grab frame without processing it, just to empty buffer
for cam in self.cams:
cam.grab()
def release(self):
for cam in self.cams:
cam.release()
Repository code: facedetection/multicamera.py.
In addition, we can create a class that works as if it was a single device, but allowing to switch between the available cameras:
# facedetection/cameraswitcher.py
from .multicamera import MultiCamera
class CameraSwitcher:
def __init__(self, multicam: MultiCamera):
self.multicam = multicam
self.previous = 0
self.current = 0
def select(self, index):
if index == self.current or index >= len(self.multicam):
return False
self.previous = self.current
self.current = index
return True
def read(self):
return self.multicam.read(self.current)
Repository code: facedetection/cameraswitcher.py.
And now we update our main code to everything we’ve done so far:
# main.py
# ... code omitted ...
from facedetection.cameraswitcher import CameraSwitcher
from facedetection.facedetection import FaceDetector
from facedetection.multicamera import MultiCamera
# ... code omitted ...
# Codes for the number keys 1-9
NUM_KEYS = {ord(str(i)) for i in range(1, 10)}
def main():
# Open both our cameras. Initially, the first one will be the active
multicam = MultiCamera(
devices=["/dev/video2", "/dev/video0"], resolution=(640, 480)
)
camswitcher = CameraSwitcher(multicam)
detector = FaceDetector()
key_pressed = 0
while not window_closed(WINDOW_TITLE) and key_pressed != ESC:
has_frame, cam_img = camswitcher.read()
if not has_frame:
continue
# Detect faces with our custom detector
detector.find_faces(cam_img)
cv2.imshow(WINDOW_TITLE, cam_img)
key_pressed = cv2.waitKey(1)
if key_pressed in NUM_KEYS:
# Select camera according to pressed key,
# 1 being the first one (index 0)
num = int(chr(key_pressed))
if camswitcher.select(num - 1):
print(f"Selected camera {num}")
else:
print(f"Can't select camera {num}")
cv2.destroyAllWindows()
multicam.release()
# ... code omitted ...
Repository code: main_04_cameraswitch.py.
Notice that in this code we mapped the number keys 1-9 to their equivalent in the order of the cameras we opened. I used 1 as the first one (camera at position 0) not only because it’s more intuitive as an UI, but also because the number keys on top of the keyboard start at 1.
Now we can switch between cameras by pressing the number keys (in the video I’m pressing 1 and 2):
Description: video demonstration of camera switching by number key press.
A bit of flair: fade effect
Switching between cameras is currently very sudden, it would be nice to have a smoother transition. So let’s create a class with a fade effect (explanation as comments):
# facedetection/fadecameraswitcher.py
import time
from .cameraswitcher import CameraSwitcher
class FadeCameraSwitcher(CameraSwitcher):
def __init__(self, *args, fade_delay=1, **kwargs):
super().__init__(*args, **kwargs)
# How long should the fade effect take
self.fade_delay = fade_delay
# The time when the last camera selection happened
self.select_time = 0
# The current opacity of the selected camera in relation to the
# previous one. 1.0 means we start fully with the current camera
self.current_opacity = 1.0
def is_changing(self):
# if opacity is not 1.0, then we're in the middle of a camera change
return self.current_opacity != 1.0
def select(self, index):
# Don't switch camera if we're already
# in the middle of a change
if self.is_changing():
return False
ret = super().select(index)
# if we were able to select a different camera
if ret:
# opacity 0.0 means the image is fully from the
# previous camera and nothing yet from the new one
self.current_opacity = 0.0
# keep track of when the change happened, to be
# able to time the fade correctly
self.select_time = time.time()
return ret
def read(self):
# read from the current camera
curr_has_frame, curr_img = self.multicam.read(self.current)
# if we're not fading or no available frame, just return
if not self.is_changing() or not curr_has_frame:
return curr_has_frame, curr_img
# read from the previous camera
prev_has_frame, prev_img = self.multicam.read(self.previous)
# if no available frame from previous camera,
# just return the image from the current camera
if not prev_has_frame:
return curr_has_frame, curr_img
# return a blend between the previous and current cameras
return True, self._blend(prev_img, curr_img)
def _blend(self, img_from, img_to):
# how long has passed since we started fading
elapsed_time = time.time() - self.select_time
# opacity is equivalent to a % of fade delay that already elapsed
self.current_opacity = min(elapsed_time / self.fade_delay, 1.0)
# mix a % of the current image with the previous one
blend = img_to * self.current_opacity
blend += img_from * (1 - self.current_opacity)
# convert to original type, as it might've changed during calculations
return blend.astype(img_to.dtype)
Repository code: facedetection/fadecameraswitcher.py.
In our main code, the only thing we need to change is to replace it as the camera switcher:
# main.py
# ... code omitted ...
from facedetection.fadecameraswitcher import FadeCameraSwitcher
# ... code omitted ...
def main():
# ... code omitted ...
# Change the camera switcher
camswitcher = FadeCameraSwitcher(multicam)
# ... code omitted ...
Repository code: main_05_fadecameraswitch.py.
And voilá:
Description: video demonstration of fading effect when switching cameras.
Switching cameras automatically with face detection
Finally, why we’re here: to automatically switch cameras to the one we’re facing. Let’s create a new class, responsible for that:
# facedetection/autocameraswitcher.py
import time
import numpy as np
from .facedetection import FaceDetector
from .fadecameraswitcher import FadeCameraSwitcher
class AutoCameraSwitcher(FadeCameraSwitcher):
def __init__(self, *args, check_delay=0.2, **kwargs):
super().__init__(*args, **kwargs)
# delay for checking for faces, as it would be
# wasteful to check every frame
self.check_delay = check_delay
# when was the last check
self.last_check = 0
self.detector = FaceDetector()
def read(self):
# if we're not already changing cameras and enough
# time has passed, check and select facing camera
if (
not self.is_changing()
and time.time() - self.last_check >= self.check_delay
):
self.last_check = time.time()
self._select_facing_cam()
return super().read()
def _select_facing_cam(self):
n_cams = len(self.multicam)
# create a numpy array with the confidence of detecting
# a face in each of the cameras (as multiple can be detected)
detections = np.zeros(n_cams)
# for each camera...
for index in range(n_cams):
has_frame, img = self.multicam.read(index)
if not has_frame:
continue
# detect faces and get the confidence of each
_, _, confidences = self.detector.find_faces(img)
# if there's at least one face detected
if len(confidences) > 0:
# get the biggest confidence among faces on this camera
detections[index] = confidences.max()
# print(detections)
max_confidence = detections.max()
# if any camera detected a face with some confidence
if max_confidence > 0:
# select the camera with the biggest confidence
face_index = detections.argmax()
self.select(face_index)
Repository code: facedetection/autocameraswitcher.py.
Our code handles face detection across multiple cameras by choosing the camera of which the detection has the largest confidence. If that camera happens to be the one that is already currently being used, then the select()
method will just ignore our request and not do anything, keeping the current camera as it is.
Also notice that when selecting the facing cam we need the image of each camera to detect faces. But we also need the images for the previous and current cameras to do the fading effect. So there are different places reading from the camera, and we don’t need to re-query the camera for a new frame each time, as that would take more time compared to reusing the same image.
There are multiple ways of solving this, but an easy one is to simply include a kind of frame cache for the camera, so that consecutive reads just use the cached image.
# main.py
from facedetection.autocameraswitcher import AutoCameraSwitcher
from facedetection.multicameracached import MultiCameraCached
# ... code omitted ...
def main():
# use the cached camera
multicam = MultiCameraCached(
devices=["/dev/video2", "/dev/video0"],
resolution=(640, 480)
)
# the AutoCameraSwitcher already has a face detector,
# so we don't need one here anymore
camswitcher = AutoCameraSwitcher(multicam)
key_pressed = 0
while not window_closed(WINDOW_TITLE) and key_pressed != ESC:
# clear the cache before starting the read process
multicam.clear_cache()
# read, which now includes detecting faces and selecting camera
has_frame, cam_img = camswitcher.read()
if not has_frame:
continue
cv2.imshow(WINDOW_TITLE, cam_img)
# ... rest of loop code (keypress) omitted ...
# ... code omitted ...
Repository code: main_06_autocameraswitch.py.
An interesting question here (that is related more to programming in general than to our specific problem) is: why clear the cache in our main loop, instead of doing it within the camera switcher’s read()
method, which is where we read frames and detect faces?
Well, the main reason is that the AutoCameraSwitcher
’s constructor comes from CameraSwitcher
, which expects a MultiCamera
parameter. A MultiCamera
is not necessarily a cached one, so making AutoCameraSwitcher
aware of the cache (to be able to clear it) would create a higher coupling and require changes in the constructor.
In any case, as I said before, these classes were mostly structured for instructional value, not for performance. So the code would probably be very different if that was the focus.
In any case, the code should work as expected, now without needing to press anything:
Description: video demonstration of auto camera switching based on face detection.
A bit more of optimization: taking the framerate into account
Right now, our main loop is polling for frames continuously, which means it will call the relevant methods as fast as it can. Even though there are some checks within our classes (to delay face detection, for example), we’re not only over-polling the camera, but making method calls unnecessarily, especially considering that dynamic languages such as Python can have a significant overhead for method calls.
To be honest, this might be a little bit of premature optimization, as I’ve done no real benchmark or profiling to see if this is really a problem. And, as it was famously said by Donald Knuth (here), “about 97% of the time: premature optimization is the root of all evil.”
Perhaps this is in the remaining 3%, perhaps it is in fact premature optimization, or perhaps it’s just another interesting issue to consider. Regardless, let’s see what we can do about it.
Option 1: event or callback based polling
The best case scenario would be some kind of event or callback based code, where OpenCV would let us know whenever there’s a new frame, instead of us having to pool the camera repeatedly. As far I can tell, this is not natively supported by OpenCV, so we would need to use additional tools and libraries, making the whole setup more complicated.
Option 2: multiple threads or processes
Another option would be to split our code into two threads, one responsible for polling the frames, and the other one processing the images whenever they are available. There are two problems with this approach. The first one is that Python is not particularly good for multithreading, at least until version 3.12. Starting from Python 3.13, you can optionally change the default configuration of the language to support better multithreading (details from the official docs). A similar option would be to use multiple processes instead of threads, basically with the same goal.
This could improve the performance by parallelizing some things, but in the end the core problem is not solved. We’d still be over-polling the cameras and calling methods unnecessarily, it’s just that we’d be doing it in a different thread/process. In case you’re interested in this approach, you can read about it in this article: Faster video file FPS with cv2.VideoCapture and OpenCV.
Option 3:
A third option is to manually take into account some kind of framerate, avoiding doing too much unnecessarily. This is not a perfect solution, but on average it might at least save us some CPU cycles per loop iteration.
# main.py
# ... code omitted ...
# processing frame rate, might be different from camera's frame rate
FRAME_RATE = 120
# time delay between frames
FRAME_DELAY = 1.0 / FRAME_RATE
def main():
multicam = MultiCameraCached(
devices=["/dev/video2", "/dev/video0"], resolution=(640, 480)
)
camswitcher = AutoCameraSwitcher(multicam)
# keep track of the last time we actually processed anything
last_time = 0
key_pressed = 0
while not window_closed(WINDOW_TITLE) and key_pressed != ESC:
if time.time() - last_time >= FRAME_DELAY:
# if enough time has passed, process frame normally
new_last_time = time.time()
multicam.clear_cache()
has_frame, cam_img = camswitcher.read()
if not has_frame:
continue
# the point in which we update the time affects image sync/lag
last_time = new_last_time
cv2.imshow(WINDOW_TITLE, cam_img)
else:
# otherwise, just empty camera's buffer (see explanation below)
multicam.flush()
# ... code omitted ...
# ... code omitted ...
Repository code: main_07_framerate.py.
The basic idea here is to define a fixed frame rate in which we actually try to process frames from the cameras. This frame rate might be different from the camera, and in fact different cameras might be working at different rates. This is just a way of pacing ourselves when processing frames.
One important thing to notice is that, if we’re not actually processing a frame, we need to flush the cameras’ buffers, to avoid creating a lag in the image. This is done by our MultiCamera.flush()
method, which basically calls OpenCV’s VideoCapture.grab()
.
The grab()
method is what is used by the read()
method to grab a frame from the camera. The difference is that grab()
doesn’t process the data, so it’s much faster.
I’m found out with a little bit of tinkering that a frame rate of 120fps (frames per second) is a good value, although the camera itself is working at 30fps. With values lower than 120, there is some lag and/or stuttering. My guess is that it’s because the loop iteration is not necessarily in sync with the camera cycle, so there is a difference between when the camera takes a frame, and when we read it.
The result is visually indistinguishable from the previous version:
Description: video demonstration of framerate performance.
But did we actually improve anything?
Short answer: it doesn’t seem so.
After finishing it I did a bit of benchmarking comparing the last two versions, and it doesn’t seem to have any significant impact on CPU usage. In fact, if anything the average CPU usage seems to have gone up a little bit.
Well, at least it was fun to try.
Next steps
In this post, we implemented the feature of detecting the camera we’re facing, and displaying its video.
However, the original problem that sparked this project was camera switching during meetings. So for this to actually solve the problem, we would need a way to use our program output as a camera within Zoom, for example.
That is possible to do, and in fact I have done it based on some of the ideas in the post: Virtual Camera for OpenCV using V4L2Loopback. In that post the author uses mainly C++, but the fundamentals are applicable here.
The basic idea is to use the v4l2loopback
kernel module to create a virtual device and write our video data to it in real time. That virtual device, in turn, can be used as a regular camera on Zoom, but our software will route the video from the camera we’re facing. There are some complications with format conversion, compatibility, and performance, but it works.
Perhaps at some point I’ll write another post detailing that process. I would also like to try some performance improvements using multithreading with Python 3.13, or perhaps using other tools to add event-based processing of frames. If anyone would be interested in seeing something like that, let me know.