Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

lwsinclair/codebadger-toolkit

 
 

Repository files navigation

🕷️ joern-mcp

A Model Context Protocol (MCP) server that provides AI assistants with static code analysis capabilities using Joern's Code Property Graph (CPG) technology.

Features

  • Multi-Language Support: Java, C/C++, JavaScript, Python, Go, Kotlin, C#, Ghidra, Jimple, PHP, Ruby, Swift
  • Docker Isolation: Each analysis session runs in a secure container
  • GitHub Integration: Analyze repositories directly from GitHub URLs
  • Session-Based: Persistent CPG sessions with automatic cleanup
  • Redis-Backed: Fast caching and session management
  • Async Queries: Non-blocking CPG generation and query execution

Quick Start

Prerequisites

  • Python 3.8+
  • Docker
  • Redis
  • Git

Installation

  1. Clone and install dependencies:
git clone https://github.com/Lekssays/joern-mcp.git
cd joern-mcp
pip install -r requirements.txt
  1. Setup (builds Joern image and starts Redis):
./setup.sh
  1. Configure (optional):
cp config.example.yaml config.yaml
# Edit config.yaml as needed
  1. Run the server:
python main.py
# Server will be available at http://localhost:4242

Integration with GitHub Copilot

The server uses Streamable HTTP transport for network accessibility and supports multiple concurrent clients.

Add to your VS Code settings.json:

{
  "github.copilot.advanced": {
    "mcp": {
      "servers": {
        "joern-mcp": {
          "url": "http://localhost:4242/mcp",
        }
      }
    }
  }
}

Make sure the server is running before using it with Copilot:

python main.py

Available Tools

Core Tools

  • create_cpg_session: Initialize analysis session from local path or GitHub URL
  • run_cpgql_query: Execute synchronous CPGQL queries with JSON output
  • run_cpgql_query_async: Execute asynchronous queries with status tracking
  • get_query_status: Check status of asynchronously running queries
  • get_query_result: Retrieve results from completed queries
  • cleanup_queries: Clean up old completed query results
  • get_session_status: Check session state and metadata
  • list_sessions: View active sessions with filtering
  • close_session: Clean up session resources
  • cleanup_all_sessions: Clean up multiple sessions and containers

Code Browsing Tools

  • get_codebase_summary: Get high-level overview of codebase (file count, method count, language)
  • list_files: List all source files with optional regex filtering
  • list_methods: Discover all methods/functions with filtering by name, file, or external status
  • get_method_source: Retrieve actual source code for specific methods
  • list_calls: Find function call relationships and dependencies
  • get_call_graph: Build call graphs (outgoing callees or incoming callers) with configurable depth
  • list_parameters: Get detailed parameter information for methods
  • find_literals: Search for hardcoded values (strings, numbers, API keys, etc)
  • get_code_snippet: Retrieve code snippets from files with line range

Security Analysis Tools

  • find_taint_sources: Locate likely external input points (taint sources)
  • find_taint_sinks: Locate dangerous sinks where tainted data could cause vulnerabilities
  • find_taint_flows: Find dataflow paths from sources to sinks using Joern dataflow primitives
  • find_argument_flows: Find flows where the exact same expression is passed to both source and sink calls
  • check_method_reachability: Check if one method can reach another through the call graph
  • list_taint_paths: List detailed taint flow paths from sources to sinks
  • get_program_slice: Build a program slice from a specific line or call

Example Usage

# Create session from GitHub
{
  "tool": "create_cpg_session",
  "arguments": {
    "source_type": "github",
    "source_path": "https://github.com/user/repo",
    "language": "java"
  }
}

# Get codebase overview
{
  "tool": "get_codebase_summary",
  "arguments": {
    "session_id": "abc-123-def"
  }
}

# List all methods in the codebase
{
  "tool": "list_methods",
  "arguments": {
    "session_id": "abc-123-def",
    "include_external": false,
    "limit": 50
  }
}

# Get source code for a specific method
{
  "tool": "get_method_source",
  "arguments": {
    "session_id": "abc-123-def",
    "method_name": "authenticate"
  }
}

# Find what methods call a specific function
{
  "tool": "get_call_graph",
  "arguments": {
    "session_id": "abc-123-def",
    "method_name": "execute_query",
    "depth": 2,
    "direction": "incoming"
  }
}

# Search for hardcoded secrets
{
  "tool": "find_literals",
  "arguments": {
    "session_id": "abc-123-def",
    "pattern": "(?i).*(password|secret|api_key).*",
    "limit": 20
  }
}

# Get code snippet from a file
{
  "tool": "get_code_snippet",
  "arguments": {
    "session_id": "abc-123-def",
    "filename": "src/main.c",
    "start_line": 10,
    "end_line": 25
  }
}

# Run custom CPGQL query
{
  "tool": "run_cpgql_query",
  "arguments": {
    "session_id": "abc-123-def",
    "query": "cpg.method.name.l"
  }
}

# Find potential security vulnerabilities
{
  "tool": "find_taint_sources",
  "arguments": {
    "session_id": "abc-123-def",
    "language": "c"
  }
}

# Check for data flows from sources to sinks
{
  "tool": "find_taint_flows",
  "arguments": {
    "session_id": "abc-123-def",
    "source_patterns": ["getenv", "fgets"],
    "sink_patterns": ["system", "sprintf"]
  }
}

# Find argument flows between function calls
{
  "tool": "find_argument_flows",
  "arguments": {
    "session_id": "abc-123-def",
    "source_name": "validate_input",
    "sink_name": "process_data",
    "arg_index": 0
  }
}

# Get detailed taint paths
{
  "tool": "list_taint_paths",
  "arguments": {
    "session_id": "abc-123-def",
    "source_pattern": "getenv",
    "sink_pattern": "system",
    "max_paths": 5
  }
}

# Build program slice for security analysis
{
  "tool": "get_program_slice",
  "arguments": {
    "session_id": "abc-123-def",
    "filename": "main.c",
    "line_number": 42,
    "call_name": "memcpy"
  }
}

Security Analysis Capabilities

The security analysis tools provide comprehensive vulnerability detection including:

Taint Analysis:

  • Source identification: find_taint_sources locates external input points
  • Sink identification: find_taint_sinks finds dangerous operations
  • Flow analysis: find_taint_flows traces data from sources to sinks
  • Argument flow analysis: find_argument_flows finds exact expression reuse between calls
  • Path enumeration: list_taint_paths provides detailed propagation chains

Program Slicing:

  • Backward slicing: get_program_slice shows all code affecting a specific operation
  • Data dependencies: Variable assignments and data flow tracking
  • Control dependencies: Conditional statements affecting execution

Reachability Analysis:

  • Method connectivity: check_method_reachability verifies call graph connections
  • Impact analysis: Understand potential execution paths

Configuration

Key settings in config.yaml:

server:
  host: 0.0.0.0
  port: 4242
  log_level: INFO

redis:
  host: localhost
  port: 6379

sessions:
  ttl: 3600                # Session timeout (seconds)
  max_concurrent: 50       # Max concurrent sessions

cpg:
  generation_timeout: 600  # CPG generation timeout (seconds)
  supported_languages: [java, c, cpp, javascript, python, go, kotlin, csharp, ghidra, jimple, php, ruby, swift]

Environment variables override config file settings (e.g., MCP_HOST, REDIS_HOST, SESSION_TTL).

Example CPGQL Queries

Find all methods:

cpg.method.name.l

Find hardcoded secrets:

cpg.literal.code("(?i).*(password|secret|api_key).*").l

Find SQL injection risks:

cpg.call.name(".*execute.*").where(_.argument.isLiteral.code(".*SELECT.*")).l

Find complex methods:

cpg.method.filter(_.cyclomaticComplexity > 10).l

Architecture

  • FastMCP Server: Built on FastMCP 2.12.4 framework with Streamable HTTP transport
  • HTTP Transport: Network-accessible API supporting multiple concurrent clients
  • Docker Containers: One isolated Joern container per session
  • Redis: Session state and query result caching
  • Async Processing: Non-blocking CPG generation
  • CPG Caching: Reuse CPGs for identical source/language combinations

Development

Project Structure

joern-mcp/
├── src/
│   ├── services/       # Session, Docker, Git, CPG, Query services
│   ├── tools/          # MCP tool definitions
│   ├── utils/          # Redis, logging, validators
│   └── models.py       # Data models
├── playground/         # Test codebases and CPGs
├── main.py            # Server entry point
├── config.yaml        # Configuration
└── requirements.txt   # Dependencies

Running Tests

# Install dev dependencies
pip install -r requirements.txt

# Run tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

Code Quality

# Format
black src/ tests/
isort src/ tests/

# Lint
flake8 src/ tests/
mypy src/

Troubleshooting

Setup issues:

# Re-run setup to rebuild and restart services
./setup.sh

Docker issues:

# Verify Docker is running
docker ps

# Check Joern image
docker images | grep joern

# Check Redis container
docker ps | grep joern-redis

Redis connection issues:

# Test Redis connection
docker exec joern-redis redis-cli ping

# Check Redis logs
docker logs joern-redis

# Restart Redis
docker restart joern-redis

Server connectivity:

# Test server is running
curl http://localhost:4242/health

# Check server logs for errors
python main.py

Loading large projects:

joern:
  binary_path: ${JOERN_BINARY_PATH:joern}
  memory_limit: ${JOERN_MEMORY_LIMIT:16g}
  java_opts: ${JOERN_JAVA_OPTS:-Xmx16G -Xms8G -XX:+UseG1GC -Dfile.encoding=UTF-8}

Debug logging:

export MCP_LOG_LEVEL=DEBUG
python main.py

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

  • Getting started with development setup
  • Code style and quality guidelines
  • Testing requirements and best practices
  • Submitting changes through pull requests
  • Reporting issues and feature requests
  • Documentation standards

Quick start for contributors:

git clone https://github.com/YOUR_USERNAME/joern-mcp.git
cd joern-mcp
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./setup.sh

# Create feature branch
git checkout -b feature/your-feature

# Make changes and run tests
pytest && black . && flake8

# Submit pull request

See CONTRIBUTING.md for detailed guidelines.

Acknowledgments


Built with ❤️ in Doha 🇶🇦

About

A production-ready Model Context Protocol (MCP) server that provides AI assistants with static code analysis capabilities using Joern's Code Property Graph (CPG).

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 97.3%
  • C 1.4%
  • Other 1.3%