🎯

ace-step

🎯Skill

from agentspace-so/runcomfy-agent-skills

What it does

Generates, inpaints, and outpaints music using StepFun-AI's open-weights ACE Step model via the RunComfy CLI, with tag-driven composition, multilingual lyrics, and four endpoints at $0.0002-0.0003 per second of audio.

Overview

ACE Step is a Claude Code skill for generating, inpainting, and outpainting music using StepFun-AI's open-weights ACE Step model via the RunComfy CLI. It provides four endpoints: ACE Step text-to-audio and ACE Step 1.5 text-to-audio for generating music from tags and lyrics, ACE Step audio-inpaint for regenerating a time range inside an existing track, and ACE Step audio-outpaint for extending a track before or after. Output is stereo audio up to 4 minutes per call, at $0.0002-$0.0003 per second (approximately 27x cheaper than ElevenLabs Music).

Key Features

Tag-driven composition - Generate music using genre, mood, instrument, and style tags combined with structured lyrics with section markers, producing coherent tracks from text input.
Audio inpainting - Regenerate a specific time range inside an existing track with ACE Step audio-inpaint, fixing a bad chorus or replacing a section without re-generating the entire song.
Audio outpainting - Extend an existing track before or after its current duration with ACE Step audio-outpaint, lengthening a 30-second draft into a 2-minute cut or adding an intro/outro.
50+ language vocals with ACE Step 1.5 - The 1.5 version supports vocals in 50+ languages with refined structured-lyric handling, at a slightly higher cost ($0.0003/s vs $0.0002/s for the base model).

Who is this for?

Independent musicians and producers who need affordable AI music generation with edit capabilities (inpaint/outpaint) for iterating on compositions
Game developers and content creators building music libraries at scale who need low-cost generation ($0.0002/s) with tag-driven style control
Developers building music generation pipelines who need CLI-accessible text-to-audio, audio inpainting, and audio outpainting in a single tool

📦

Same repository

agentspace-so/runcomfy-agent-skills(30 items)

ace-step

Installation

Vibe Index InstallInstalls to .claude/skills/

npx vibeindex add agentspace-so/runcomfy-agent-skills --skill ace-step

skills.sh Install⚠ Installs to .agents/skills/

npx skills add agentspace-so/runcomfy-agent-skills --skill ace-step

Manual InstallCopy SKILL.md content and save to the path below

~/.claude/skills/ace-step/SKILL.md

SKILL.md

284,736Installs

AddedMay 15, 2026

View on GitHub Back to Skills

More from this repository10

🎯

nano-banana-2🎯Skill

A RunComfy skill that generates images using Google Nano Banana 2, the flash-tier text-to-image model in the Gemini family. Optimized for rapid iteration, social thumbnails, and in-image typography with configurable resolution tiers and safety tolerance.

🎯

image-edit🎯Skill

A smart intent-routing skill for image editing on RunComfy that selects the best model based on the editing task. Routes to Nano Banana Edit for batch edits up to 20 images, GPT Image 2 for multilingual text rewrite, Flux Kontext Pro for single-shot precise edits, or Z-Image Turbo for mask-driven inpainting.

🎯

kling-3-0🎯Skill

Provides Kling 3.0 video generation on RunComfy, covering all six endpoints across three quality tiers (Standard, Pro, 4K) and two modes (text-to-video, image-to-video) for Kuaishou's third-generation cinematic video model with native synchronized audio.

🎯

nano-banana-edit🎯Skill

Edit images with Google Nano Banana 2 on RunComfy, supporting batch edits of up to 20 images per call with strong identity preservation. Features localized edits using spatial language, background swaps, and configurable resolution up to 4K.

🎯

wan-2-7🎯Skill

Generate text-to-video with Wan-AI's Wan 2.7 on RunComfy, featuring multi-reference conditioning and audio-driven lip-sync via custom audio tracks. Supports prompt expansion, negative prompts, and up to 1080p resolution through the RunComfy CLI.

🎯

gpt-image-edit🎯Skill

Edit images with OpenAI GPT Image 2 on RunComfy, excelling at multilingual in-image text editing across any script (Latin, kana, CJK, Cyrillic, Arabic) and multi-reference composition with up to 10 input images. Ideal for identity-preserving edits and layout-precise repositioning.

🎯

happyhorse-1-0🎯Skill

Generate text-to-video with HappyHorse 1.0 on RunComfy, currently ranked #1 on Artificial Analysis Video Arena. Supports native 1080p with in-pass synchronized audio, multi-shot character consistency, and 6-language prompt support via the RunComfy CLI.

🎯

seedance-v2🎯Skill

Generate cinematic short-form video with ByteDance Seedance 2.0 Pro on RunComfy, supporting multi-modal references including up to 9 images, 3 videos, and 3 audio tracks. Features native lip-synced audio generation and is ideal for brand-consistent multi-language narratives.

🎯

flux-2-klein🎯Skill

A RunComfy skill for generating images with Black Forest Labs' Flux 2 Klein, the distilled low-latency variant of Flux 2. Supports 9B and 4B model variants with sub-second inference for real-time art direction, rapid concepting, and multi-reference brand styling.

🎯

runcomfy-cli🎯Skill

The foundation skill for the RunComfy platform, providing a single CLI to install, authenticate, and invoke hundreds of model endpoints including image generation, video, face-swap, lip-sync, and LoRA training.