OmniPath Subset Download

omnipath-subset-download/SKILL.md

Resolve entities and ontology terms, validate filters through the API, preview slices, and export parquet subsets.

---
name: omnipath-subset-download
description: Resolve biological entities and ontology terms, discover valid graph filter values from the OmniPath API service, preview slices, and download parquet subsets.
---

# OmniPath Subset Download

Turn biological subset requests into validated OmniPath parquet exports.

## Rule

Do not guess IDs or filter values. Resolve terms/entities and discover valid filters via the API before exporting.

## Setup

```bash
API_BASE=${API_BASE:-https://dev.omnipathdb.org/api}
curl -fsS "$API_BASE/health"
```

## Workflow

1. **Parse intent**: identify entities, ontology concepts, species/taxonomy, relation constraints, sources, and desired output table(s). Ask if ambiguous, e.g. about including ontology descendants.

2. **Resolve ontology concepts**:
   - Search names: `POST /terms/search`
   - Lookup IDs: `POST /terms`
   - Optional hierarchy: `GET /{ontology_id}/term/{term_id}/ancestors|descendants`
   - Use returned term IDs, e.g. `GO:0006915`. Include descendants only if requested/confirmed.

3. **Resolve entities**:
   - `POST /entities/resolve` with symbols, UniProt IDs, or public IDs.
   - Use numeric `entity_pk` values in filters.
   - If multiple matches are plausible, ask the user.

4. **Discover valid filters**:
   - Use `POST /entities/scoped-facets` and `POST /relations/scoped-facets`: returns valid facet values **with counts** for the current scope; with an empty body it acts like global discovery.
   - Use `POST /ontology/scoped-search` to find valid annotation/ontology terms globally or within selected entities.

5. **Build filters**:

   Entity filters:
   - `entity_pks`: numeric internal entity IDs from `/entities/resolve`.
   - `entity_types`: entity classes, e.g. protein, complex, small molecule; discover via `/entities/scoped-facets`.
   - `taxonomy_ids`: species/taxon IDs, e.g. `9606` for human.
   - `sources`: resources contributing entity records; discover via scoped facets.

   ```json
   {"entity_pks":[123],"entity_types":["MI:0326:Protein"],"taxonomy_ids":["9606"],"sources":["uniprot"]}
   ```

   Relation filters:
   - `entity_pks`: keep relations where subject or object is one of these entities.
   - `relation_categories`: one of `interaction`, `membership`, `annotation`.
   - `predicates`: relation verbs, e.g. `positively_regulates`; discover via `/relations/scoped-facets`.
   - `participant_types`: subject/object entity classes involved in relations.
   - `sources`: resources contributing relation records.
   - `annotation_terms`: ontology term IDs annotating relations, e.g. GO/HP/MI/KW terms.

   ```json
   {"entity_pks":[123],"relation_categories":["interaction"],"predicates":["positively_regulates"],"participant_types":["MI:0326:Protein"],"sources":["signor"],"annotation_terms":["GO:0006915"]}
   ```

   Annotation filters:
   - `prefixes`: ontology prefixes, e.g. `GO`, `HP`, `MI`, `KW`.
   - `entity_pks`: annotation terms attached to these entities.

   ```json
   {"prefixes":["GO"],"entity_pks":[123]}
   ```

   Avoid deprecated `search_*` fields. Normally do **not** use `annotation_scopes`; `annotation_terms` is enough.

6. **Preview before export**:
   - `POST /entities/slice`
   - `POST /relations/slice`
   - Use `limit: 5` first. Check status, `total`, and row plausibility.

7. **Export parquet**:
   - `POST /exports/entities/parquet`
   - `POST /exports/relations/parquet`
   - `POST /exports/annotations/parquet`

   Save the exact JSON payload next to the parquet for reproducibility.

8. **Validate parquet**:

   ```python
   import polars as pl
   df = pl.read_parquet("subset.parquet")
   print(df.shape)
   print(df.head())
   ```

   If available, report `X-Export-Row-Count`, `X-Export-Duration-Ms`, and output path.

OmniPath Resource Zips

omnipath-resource-zips/SKILL.md

Download and reuse complete per-resource zip archives, unpack parquet files, and join graph/evidence tables locally.

---
name: omnipath-resource-zips
description: Download and reuse complete OmniPath per-resource zip archives, unpack their parquet files locally, and analyze resource-specific graph data with entity/relation/evidence joins. Use when the user asks to work with individual OmniPath resources directly rather than filtered API subsets.
---

# OmniPath Resource Zips

Use this skill when the user wants complete data from one or more individual OmniPath resources, e.g. SIGNOR, Reactome, IntAct, UniProt, CORUM.

## Rule

Prefer resource zips for full-resource, offline, repeatable analysis. Prefer subset exports when the user wants a small filtered slice.

## Setup

Use the API service only.

```bash
API_BASE=${API_BASE:-https://dev.omnipathdb.org/api}
DATA_DIR=${DATA_DIR:-omnipath-data}
mkdir -p "$DATA_DIR"
curl -fsS "$API_BASE/health"
```

## Workflow

1. **List resources**:

   ```bash
   curl -sS "$API_BASE/resources"
   ```

   Use the returned `resource_id` values exactly.

2. **Download resource zip(s)**:

   Single resource:
   ```bash
   curl -L "$API_BASE/resources/{resource_id}/download" \
     -o "$DATA_DIR/{resource_id}.zip"
   ```

   Multiple resources as one bundle:
   ```bash
   curl -L "$API_BASE/resources/download" \
     -H 'Content-Type: application/json' \
     -o "$DATA_DIR/resources_bundle.zip" \
     -d '{"resource_ids":["signor","reactome"]}'
   ```

3. **Reuse downloads**:
   - Save all zips under `omnipath-data/`.
   - If a zip already exists and the user did not request refresh, reuse it.

4. **Unpack/read parquet locally**:

   ```bash
   unzip -n "$DATA_DIR/signor.zip" -d "$DATA_DIR/signor"
   find "$DATA_DIR/signor" -name '*.parquet'
   ```

5. **Understand graph files**:
   - `entity.parquet`: nodes/entities.
   - `entity_relation.parquet`: edges/relations.
   - `entity_relation_evidence.parquet`: provenance/evidence for relations.

6. **Join safely**:
   - Within one resource, join relations to entities by local `entity_pk`.
   - Across different resource zips, do **not** join by `entity_pk`; those IDs are resource-local.
   - Cross-resource joins should use stable identifiers such as `canonical_identifier` plus `canonical_identifier_type`.

7. **Analyze with local graph joins**:

   ```python
   import polars as pl
   root = "omnipath-data/signor"

   entities = pl.scan_parquet(f"{root}/**/entity.parquet")
   relations = pl.scan_parquet(f"{root}/**/entity_relation.parquet")
   evidence = pl.scan_parquet(f"{root}/**/entity_relation_evidence.parquet")

   graph = (
       relations
       .join(entities.select([
           pl.col("entity_pk").alias("subject_entity_pk"),
           pl.col("canonical_identifier").alias("subject_id"),
       ]), on="subject_entity_pk", how="left")
       .join(entities.select([
           pl.col("entity_pk").alias("object_entity_pk"),
           pl.col("canonical_identifier").alias("object_id"),
       ]), on="object_entity_pk", how="left")
   )

   print(graph.limit(5).collect())
   ```