This is the kind of AI stuff that really annoys me. Looking at one of the mutation examples I didn't see anything that wouldn't normally be tested by a typical mutation tool. You took a simple, idempotent process and you got an llm to do it slower, less accurately, and using more resources.
If you wanted to marry the two in a new and possibly useful fashion I would say use an llm to analyze the results of a standard mutation test and give guidance on what issues should be acted upon first. An off-by-one calculation could mean somebody loses a million dollars or it could mean a button is grayed out. Standard mutation tools don't give you that context.
I understand your concerns. The examples we provided are indeed trivial, but they are just the starting point. Our goal is to leverage LLMs to generate mutants that closely resemble real-world bugs with better context. While traditional mutation tools are excellent, we believe LLMs can bring an additional layer of sophistication and versatility.
As you rightly pointed out, standard mutation tools often lack the context to prioritize issues effectively. We’re currently working on using LLMs to analyze the output of survived mutants to provide better guidance on which issues should be addressed first. This way, an off-by-one error that could potentially cause significant problems is highlighted more prominently during the code review PR process.
As someone who has used mutation testing, I’ve always wondered about the sheer amount of useless mutants being generated. Going through all these mutants manually to improve test cases is quite cumbersome. If we can reduce the number of mutants generated, produce higher quality mutants, and analyze them automatically to highlight weaknesses in the tests during PRs, wouldn’t that be cool? We’re aiming to achieve just that.
Moreover, this approach can theoretically work for any programming language or testing framework, making it a versatile solution across different development environments.
We’re also developing a QA system to more accurately define and identify “higher quality mutants,” as discussed in the research paper here. Our aim is to enhance the overall mutation testing process, making it more efficient and insightful.
Hey, all in all, we want mutation testing to be adopted and widely spread. We really do appreciate the feedback. I hope you try it out as you sound like you know a thing or two about mutation testing.
You're four forks deep now
Slic3r to Prusa Slicer to Bamboo's slicer to Orca. It also borrowed a lot of ideas from Super Slicer. Since it's open source, and has been gaining some momentum, it seems to have a decent amount of contributors
Why Orca?
all the features you know and love from things up the tree
a revamped UI
built in tuning tests (temp tower, extrusion multiplier, volumetric flow, pressure advance, etc)
The UI of Prusa slicer is hot garbage though. I started with prusa slicer and moved to orca after a few months. Orca is a much nicer experience, and the built-in test-models (temp towers etc.) are nice.
In find the location and grouping of parameters more intuitive in orca. I always had to look through several tabs to find the parameter I wanted to adjust when I was using prusa, it was never where I thought it should be.
So mutation testing is able to create mutations that are not created by traditional mutations, these are mutations that are more dependent on contextual understanding of code which LLMs excel at. We do preprocessing on our side where we generate a minimal AST of all covered files and pass it to give the LLM a rich contextual understanding of the codebase, allowing us to generate good mutations. Also we make use of LiteLlm so it completely works with open source models too.
Safari support means there's benefit to web server support. Server support means there's benefit to browser support in other browsers. Apple can kick start the network effects necessary to get this standard adopted.
Webp and heic are fine for web, but JPEG XL is special in that it actually has use for print-based and other ultra high resolution workflows, while also having the best path forward for migration from JPEG.
Ooh, well that's wonderful. It's like some grassroots thing. The inventors of the thing refuse to support it, but the people are adopting it on their own. ✊ I'm happy to hear these "news" (to me)! ❤️
Regarding mutation testing, you don't write any "tests for your test". Rather, a mutation testing tool automatically modifies ("mutates") your production code to see if the modification will be caught by any of your tests.
That way you can see how well your tests are written and how well-tested parts of your application are in general. Its extremely useful.
You have written tests for your code and now feel safe because your code is tested. But test quality is really hard to measure. The idea seems to be to introduce "vulnerabilities" (whatever that means...) and see if your tests catch them. If they do that's supposed to show that the tests are good and vice versa.
What issues have you had? Ive been using orca for about a year without any issues all. I'm running Mint, both stable and beta branch have been without issues for me.
Appimage doesn't start because it relies on a system package that does exist anymore, dialogs with grey text on grey backgrounds in dark mode, stl repair not included...
Flatpak is in the works but honestly and hope that helps bit I get better prints out of prusaslicer for some reason so not holding my breath or anything.
The image just isn't being built correctly which is more a problem with appimages but the fact it's still broken... Linux is clearly a neglected platform for them.
All the problems I listed have bug reports just nothings happening to fix them.
Libwebkit isn't actually chromium, it uses blink which is a fork of part of webkit. Understandable confusion though because webkit was part of kde, forked by safari, and then used by through chrome variants for a long time.
The rest of this comment is going to necessarily be nerdy Linux internals. sorry.
Unfortunately, I'm pretty sure chromium includes it inside it's binary and does provide or use any webkit libraries.
Orca uses it internally for it's browser so it won't start unless it has access to the library. When you build a Linux app it includes the name of the library which includes the ABI (basically the version). Newer Linux release include a different version.
Appimage is one of the ways you get around this distro problem by including the versions of libraries. That's why they're so big. There are problems with that like how big the apps are stale bundled libraries with security issues but I digress.
Orca hasn't bundled webkit in the appimage and because of another problem/feature of appimage it falls back on the os library. Since new distros have dropped the older obsolete library version orca can't start.
That's a lot but I hope it explains the problem better.
I would like to help but my personal computer doesn't currently have enough memory to compile orca so back to just watching warning people it's a coming problem for them too.
github.com
Active