Software Gripes
It's no secret that software sucks.
There's lots of amazing software, but the joyous experiences are a rarity compared to the onslaught of mediocre-to-actively-harmful shit we deal with because of bad software.
The apps we use every day have features we like taken away from us.
Our productivity software makes us its bitch while we wait for a loading spinner on a network request that's fetching the same data we loaded 3 seconds ago on a different page because the app couldn't be arsed to cache things. Naturally, the request takes 5+ seconds because fuck you.
We get a loading spinner if we are lucky.
Often, shit is just happening with zero indication about what is going on. It's not rare for me to press enter and then 60 seconds later I'm wishing I had the browser network tools open before I did so I could inspect what the fuck is happening because it's a 50/50 chance that the shitass server isn't responding, or a response was received and the fuckass website encountered a scripting error handling the response.
Games crashing while playing with friends and not being able to reconnect to the lobby despite the fact that reconnection logic is already implemented for ranked game modes.
Video games which have the option for secondary input mappings for some actions but not others. Not being able to bind mouse buttons to some actions.
Ready-up systems where you can't un-ready.
Hide-HUD modes where you are unable to perform some actions because apparently the functionality is tied to the visibility of the UI for some reason.
Driver issues where sharing your screen or high network traffic causes stutters.
I've adopted the habit of Ctrl+A,Ctrl+C to make a backup in my clipboard before submitting things because who knows if it's going to get blackhole'd instead of working properly.
GitHub hasn't had a green status month since this website started keeping track in 2022.
Davinci Resolve still doesn't support .mkv files in the year of the Lord, 2026.
Thankfully, AI is here to save us from ourselves.
A tireless slop companion paired with the indomitable human spirit can bring us closer to a state where shit just works.
Naturally, the path to doing so is by rewriting everything in Rust.
You want to build a voice transcription tool that runs on-device that you can share with your mom? Point the slop cannon at the problem and you get a single binary using native APIs for drawing to the screen and doing ML inference instead of sautéing your CPU cycles with Electron and PyTorch+CPU because even UV can't figure out how to consistently rehydrate a pyproject with CUDA support.
Python ML projects can be rewritten in Rust to sidestep entire categories of bullshit and I think that's beautiful.
Now, I find myself working on a terminal emulator/DAW/profiler/file explorer while the deadline for an anniversary gift for my parents looms.
In the next 10 days, I have the goal of preparing some scrapbook pages. As a subgoal of this project, I need to collect and curate photos that contain Mom&Dad to be used.
I have a backup of our family NAS which is in a propriety format which I was able to reverse engineer with AI. Now I have a ls, cd, copy REPL I can use to automate extracting small slices of the terrabytes of backed up content where if I was using the vendor's app I would be limited to extracting single files or entire directories with no in-between.
Part of the way I build software now involves using voice2text to dictate what I want to happen which is faster than typing things by hand, but sometimes comes at the cost of fidelity since I have more precision over the output text using a keyboard than I do my voice.
I want to rewrite my little python voice2text app to use the new shit I've made with Rust recently, but I got distracted building some bullshit when really I could just have the AI update my python app to add the sound cues I want while also adjusting the behaviour so it doesn't typewriter what I say into the wrong window.
Part of the skills for making software that I haven't properly grasped yet is the design phase. When it's possible to go from zero to working for any idea, project management 🤢 skills become important to ensure I'm prioritizing the right things that will help me meet my goals in time.
To collect the media of Mom&Dad that I want to use, I need to instruct the computer on how to filter from to ∩.
The NAS has photos and videos, and there's a bunch of existing libraries/architectures for doing person recognition, but it's also kind of a search and scheduling problem.
Because the backup is in a compressed format, any media previews or inference passes have to come after a mildly costly decompress step. If I'm writing the decompressed artifacts to disk so I can pass off work to other processes for the filtering, that is higher latency solution than if I were to keep things in-memory and either do the inference in the same process or use shared memory to reduce copying.
There's many heuristics that can help us determine if we should keep scanning sequentially versus jumping around while scanning for MomDad media. If we have identified a good candidate image from an event like a birthday party, then we can assign a DEI metric that prioritizes scanning other days so we get a variety of photos instead of spending time validating 1000 images from one day.
We have in our calendar important days like anniversaries that we can use to prioritize search as well.
Videos can also be analyzed, but they have a higher decompress time since the files are larger. If we own everything, we could decompress sparse chunks and find I-frames to run our image logic on.
If we consider the most involved scenario of me looking individually at each file on the NAS until I pick 100 to keep, then that is a top-K problem and a reranking problem. I have my own mental model of the file hierarchy on the NAS, and my ability to navigate things is currently limited by the vendor's crappy software and the current state of my REPL.
I could build a file browser interface that lets me preview the images and videos, where doing so has an asynchronous aspect that the preview can only be done after decompressing the selected file.
I could add a tagging mechanism that lets me attach metadata to the virtual paths to indicate which photos I consider candidates for including in the final result. Rather than an arbitrary key-value tag system, a fixed tag categorization like keep, reviewed, discard could be simpler.
Three windows.
One for the file navigation ⸻ a primitive table view with a .. row for upwards navigation.
One for the media preview of the selected item – mouse-based zoom+pan mechanisms
One for the actions like mark-keep and mark-discard
We can design it as an event sourcing system. Instead of a mapping from file path -> set of tags[keep,viewed,discard] we can instead emit events like viewed-at, keep-clicked-at, discard-clicked-at, decompressed-to-at which would let us perform more detailed analysis like knowing what images we have looked at the most, knowing if we decompressed a file already, or finding controversial images where we have alternated between keeping and discarding.
The idea of event sourcing is something I haven't explored much but am increasingly eager to try. Hence, why I've been spending time building some kind of timeline-DAW instead of working more on file browsers.
It seems to me that time is one data type that unifies all data sources, so having robust mechanisms for interacting with time-aspected data is super important, but developing visual timeline editors will have to take a back seat while I do this other stuff I suppose.
We learn the general mannerisms of the user interfaces we are exposed to. The historical tools like the DOM for presenting information comes with implications that make things like multi-window applications a less trodden path.
So if I have a 3-window app where there's the file explorer, the preview image, and the action buttons, what if each button was its own window, letting the buttons be individually repositionable?
We can do snazzy things like draw one big window across all three monitors, controlling the hit-testing and transparency so that we have one OS window while we virtualize our own window management system.
Decomposing the problem into the core elements lets us imagine how it can be reshaped into completely new interfaces.
Given that agents are the new hotness, we could drop the file explorer window and instead have an agent be in control of which file is currently being previewed. Instead of buttons I click to present my feedback, I could hook the voice transcription up so that we have a feedback loop of the agent presenting a file to me, me making a comment on it, and the agent can manage the list of top-100 results.
One window for the preview, do we want a window to audit the voice transcription to see if there are any mistakes I should clarify? Do we want the agent to respond in text or use a text2speech program?
I have a GLaDOS speech synthesis module that is pretty low latency which could be fun. We could play back the synthesized voice at an accelerated rate to speed up the feedback loop.
There are so many building blocks and opportunities to do fun stuff, I am very excited by all of this.
Enunciating the problem and my proposed solutions is half the battle, then it's a matter of guiding the AI every couple minutes between me watching YouTube while it does the work.
Having full control of the OS with DirectX support means we can go crazy with the ideas using shaders and shit too which is cool.
Of the hard problems in computer science, my next step falls under “naming things”. I think it's time I started a new repo to build this out.
gh repo create TeamDman/Annicuration is probably a good way to start.
...Right after going to bed, waking up, wage slaving, and taking a nap/doomscrolling to reset a bit of the dread of it all. You know, the /ˈjuː.ʒ/.