It hardens a skill through judge-panel refinement rounds, it’s a quality gate that runs after authoring, not an authoring tool.
MisterBiggs 5 hours ago [-]
This is a pretty neat, I suspect that eventually every skill will have some sort of validation/verification loop like this
bob1029 6 hours ago [-]
I've been able to avoid this kind of markdown library architecture with very chatty tool feedback. Interaction with a responsive environment is much better than static chunks of "skill" text. For example, imagine a domain constraint:
"You must use tool ABC before calling tool XYZ"
This can either be in some static prompt scheme somewhere, or it can be the live result of a tool call.
If you make everything tool calling and environmental, you effectively have a lazily evaluated & dynamic prompt scheme.
I like to think of this as context for the context. The better you map the environment and descriptions of it to the agent, the less top-down prompting is required.
If you set up the harness correctly, you can run circles around a lot of what passes as AI innovation with powershell in a while loop. Adding static markdown document soup on top of this would only reduce performance in the general case.
_boffin_ 2 hours ago [-]
Can you go into more detail about your setup and use cases?
MisterBiggs 5 hours ago [-]
Yup! I feel pretty strongly that every little nit pick and instruction you pass into your model is murdering your output. Having a hook that executes on tool calls is significantly better than telling your agent to follow your repos specific format/lint/style/test constraints
basedrum 18 minutes ago [-]
Could you publish your gitlab skill to give an example?
nilirl 3 hours ago [-]
I read this post thinking "Finally! Finally someone will explain to me what I've been missing because 'skills' just seem to be re-usable text that help make prompting faster."
Nope. Still the same.
morelandjs 58 minutes ago [-]
Agree on article frustrations. Perhaps a better explanation, skills are just disk-cached prompts conditioned on verified success. The conditioned on verified success part might seem inconsequential, but it’s the whole thing that gives skills their value. Also the fact that their loading can be scoped to a certain calling context.
nilirl 44 minutes ago [-]
> conditioned on verified success
Thank you! That made it clear to me why it's an useful caching technique.
noodletheworld 2 hours ago [-]
Agree; posts like this frustrate me.
Tldr: you're doing it wrong but I will not show you how to do it right. I also did not run the bench using my approach but it definitely “vibes better” to me, and I reject your actual research paper.
Come on, show us some actual skills.
That one you use all the time looks a hell of a lot like “I wont a deterministic shell script for something a skill saying ‘run the shell script’”
Is that what you do? How much time do you spend on them? How do you stop the agent from making a bunch of very similar skills? How do you deal with the explosion of the total number of skills impacting your token use? Do you use skills from github, or is that bad practice? Why?
So many unanswered questions; so little content. :/
oniony 5 hours ago [-]
You're probably using adverbs wrongly.
whattheheckheck 8 hours ago [-]
What if I want a way to open up a latent space prompt without having to type it all out everytime?
MisterBiggs 5 hours ago [-]
Skills for repitition are totally valid. Having a version control skill that explains that I use gitea works great. My point is that asking for a skill that tells us if our program will get stuck before taking on a halting problem won't get you any further than just starting the task with xhigh thinking
Atom_Foundry 3 hours ago [-]
[dead]
theowaway213456 10 hours ago [-]
TL;DR don't have your agent write skills using only its latent knowledge, otherwise you may as well not use a skill in the first place and let it summon that latent knowledge on the fly.
Not sure if this take is correct though. I suspect self-generated skills help the agent avoid having to "decompress" its latent knowledge, which might save tokens? idk, I am not an expert
solarkraft 6 hours ago [-]
It seems so obvious: How would it know better than it already does?
Yet I’ve seen people succeed with „write me a prompt“ prompts. The model makes something up, often it makes sense.
They are like plans in that way: It’s not exactly novel knowledge, but it at least encodes it somewhere to make the process verifiable beforehand and a bit more repeatable.
I wouldn’t be surprised if it improves performance a little, just like thinking blocks do (every model reasons now).
bigcat12345678 10 hours ago [-]
I now have rules to not let agent write any docs or processes. Pretty much anything LLM auto-generated are of zero reuse value.
imhoguy 2 hours ago [-]
Autogenerated content is good scaffolding, but then I have a rule where if I mark heading with "(by-human)" the section shouldn't be changed by LLM without permission.
cassianoleal 4 hours ago [-]
Skills can transfer one session's latent knowledge to all other sessions.
Eg. Ask the agent to write a skill then get it to prompt a subagent to use the skill, then iterate until it verifies the task was completed correctly
https://github.com/bjcoombs/ai-native-toolkit/blob/main/skil...
It hardens a skill through judge-panel refinement rounds, it’s a quality gate that runs after authoring, not an authoring tool.
"You must use tool ABC before calling tool XYZ"
This can either be in some static prompt scheme somewhere, or it can be the live result of a tool call.
If you make everything tool calling and environmental, you effectively have a lazily evaluated & dynamic prompt scheme.
I like to think of this as context for the context. The better you map the environment and descriptions of it to the agent, the less top-down prompting is required.
If you set up the harness correctly, you can run circles around a lot of what passes as AI innovation with powershell in a while loop. Adding static markdown document soup on top of this would only reduce performance in the general case.
Nope. Still the same.
Thank you! That made it clear to me why it's an useful caching technique.
Tldr: you're doing it wrong but I will not show you how to do it right. I also did not run the bench using my approach but it definitely “vibes better” to me, and I reject your actual research paper.
Come on, show us some actual skills.
That one you use all the time looks a hell of a lot like “I wont a deterministic shell script for something a skill saying ‘run the shell script’”
Is that what you do? How much time do you spend on them? How do you stop the agent from making a bunch of very similar skills? How do you deal with the explosion of the total number of skills impacting your token use? Do you use skills from github, or is that bad practice? Why?
So many unanswered questions; so little content. :/
Not sure if this take is correct though. I suspect self-generated skills help the agent avoid having to "decompress" its latent knowledge, which might save tokens? idk, I am not an expert
Yet I’ve seen people succeed with „write me a prompt“ prompts. The model makes something up, often it makes sense.
They are like plans in that way: It’s not exactly novel knowledge, but it at least encodes it somewhere to make the process verifiable beforehand and a bit more repeatable.
I wouldn’t be surprised if it improves performance a little, just like thinking blocks do (every model reasons now).