Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: CodeSoul-co/THETA
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: CodeSoul-co/THETA
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: dev
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 1 commit
  • 10 files changed
  • 1 contributor

Commits on Apr 11, 2026

  1. feat: add plug-and-play scraper module with X (Twitter) support

    Adds src/scrapers/ as a fully decoupled data ingestion layer that
    fetches posts from social platforms and normalises them into
    THETA-compatible CSV files (data/{dataset}/{dataset}_cleaned.csv).
    Zero changes to src/models/ — the module hooks in via the existing
    find_data_file() discovery path in prepare_data.py.
    
    Structure:
    - base.py         AbstractScraper protocol for all platforms
    - registry.py     Platform registry (mirrors model/registry.py pattern)
    - adapter.py      ThetaAdapter: normalises raw records → THETA CSV schema
    - cli.py          Argparse entry point (python src/scrapers/cli.py)
    - platforms/x.py  XScraper via tweepy + Twitter API v2
    
    Also adds scripts/scrape.sh (bash wrapper) and X credential stubs
    to .env.example. Future platforms extend AbstractScraper and register
    in registry.py without touching any other file.
    erwinmsmith committed Apr 11, 2026
    Configuration menu
    Copy the full SHA
    48a68b1 View commit details
    Browse the repository at this point in the history
Loading