[BUG] Pagination appears broken for threat-intelligence-indicators/list (Sentinel) #29403

scottzach1 · 2024-06-11T22:00:49Z

API Spec link

https://learn.microsoft.com/en-us/rest/api/securityinsights/threat-intelligence-indicators/list?view=rest-securityinsights-2024-03-01&tabs=HTTP

API Spec version

2024-03-01

Describe the bug

The $top field is declared as optional, but nowhere does the document state the default value is 100.

When I query the REST endpoint with no $top only the first 100 indicators are returned, the subsequent nextLink
contains no indicators.

{"value": []}

This should be documented in the REST API specs or the behavior should be updated so pagination is not broken when
$top is not specified.

Expected behavior

When $top is not specified pagination should continue until all indicators are returned.

Actual behavior

Only the first 100 indicators are returned, the subsequent nextLink contains no indicators.

Reproduction Steps

// reproduce.py

import os

from typing import Iterator

from azure.identity import ClientSecretCredential
from requests import Session


def fetch(s: Session, *, top: int | None) -> Iterator[dict]:
    subscription_id = os.getenv("SENTINEL_SUBSCRIPTION_ID")
    resource_group_name = os.getenv("SENTINEL_RESOURCE_GROUP_NAME")
    workspace_name = os.getenv("SENTINEL_WORKSPACE_NAME")
    
    r = s.get(
        f"https://management.azure.com/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}"
        f"/providers/Microsoft.OperationalInsights/workspaces/{workspace_name}/providers/Microsoft.SecurityInsights"
        f"/threatIntelligence/main/indicators",
        params={"api-version": "2024-03-01", "$top": top},
    )
    r.raise_for_status()
    j = r.json()

    yield from j["value"]

    while next_link := j.get("nextLink", None):
        r = s.get(next_link)
        r.raise_for_status()
        j = r.json()
        yield from j["value"]


def ilen(iterable: Iterator[dict]) -> int:
    return sum(1 for _ in iterable)


def main():
    secret = ClientSecretCredential(
        tenant_id=os.getenv("SENTINEL_TENANT_ID"),
        client_id=os.getenv("SENTINEL_CLIENT_ID"),
        client_secret=os.getenv("SENTINEL_CLIENT_SECRET"),
    )
    token = secret.get_token("https://management.azure.com/.default").token

    s = Session()
    s.headers = {
        "Accept": "application/json",
        "Authorization": f"bearer {token}",
    }

    print(f"- {ilen(fetch(s, top=None))=}")
    print(f"- {ilen(fetch(s, top=100))=}")
    print(f"- {ilen(fetch(s, top=200))=}")
    print(f"- {ilen(fetch(s, top=1000))=}")


if __name__ == '__main__':
    main()

// shell commands

foo@bar:~$ pip install azure-identity requests
...
foo@bar:~$ python reproduce.py
- ilen(fetch(s, top=None))=100
- ilen(fetch(s, top=100))=100
- ilen(fetch(s, top=200))=200
- ilen(fetch(s, top=1000))=1000

Environment

Operating System: Linux arch 6.9.3-arch1-1
Python Version: Python 3.10.14 (venv)

The text was updated successfully, but these errors were encountered:

scottzach1 · 2024-06-11T22:32:21Z

Changes made to this endpoint may have also broken the Python Azure-SDK package downstream:

Cannot paginate threat intelligence indicators (Sentinel) azure-sdk-for-python#36021

v-jiaodi · 2024-06-12T02:48:22Z

@xuhumsft Please help take a look, thanks.

scottzach1 · 2024-06-21T02:52:43Z

*it appears that the behavior of this endpoint has changed significantly for us, but pagination appears just as broken.

Today I found the time to test the /threatIntelligence/main/indicators endpoint further. From what I can tell
pagination is clearly broken.

Unless I am using the endpoint wrong it appears that every page appears to contain every single indicator. It also
appears the generated nextLink is incrementing the $skip value but this is being completely ignored. The paging
appears to be never ending with the $skip value even being incremented by the endpoint greater than the
defined $top.

Although this is manageable for small environments this becomes a very serious concern when the number of indicators
exceeds ~6000 (depending on indicator pattern length) as we are no longer able to paginate all the indicators in the
platform.

This is a significant problem for us as we are relying on this endpoint to protect our customers. Unfortunately, it
appears that the Threat Intelligence Data Connector + Graph API GET /security/tiIndicators is the
only reliable way to ingest indicators into customer environments. Yet somehow this approach is both beta (Graph) and
deprecated (Connector).

I'll show the behavior I'm experiencing below:

Script

For this example I have modified reproduce.py to help log duplicates. Let's name it reproduce_pages.py.

# reproduce_pages.py
import os
import sys
import urllib.parse
from collections import Counter

import requests
from azure.identity import ClientSecretCredential
from requests import Session


def pagination_example() -> None:
    secret = ClientSecretCredential(
        tenant_id=os.getenv("SENTINEL_TENANT_ID"),
        client_id=os.getenv("SENTINEL_CLIENT_ID"),
        client_secret=os.getenv("SENTINEL_CLIENT_SECRET"),
    )

    s = Session()
    s.headers = {
        "Accept": "application/json",
        "Authorization": "bearer " + secret.get_token("https://management.azure.com/.default").token,
    }

    subscription_id = os.getenv("SENTINEL_SUBSCRIPTION_ID")
    resource_group_name = os.getenv("SENTINEL_RESOURCE_GROUP_NAME")
    workspace_name = os.getenv("SENTINEL_WORKSPACE_NAME")

    params = {"api-version": "2024-03-01", "$top": 8_000}
    next_link = (
        f"https://management.azure.com/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}"
        f"/providers/Microsoft.OperationalInsights/workspaces/{workspace_name}/providers/Microsoft.SecurityInsights"
        f"/threatIntelligence/main/indicators?{urllib.parse.urlencode(params)}"
    )

    counter = Counter()
    page = 1

    while next_link:
        r = s.get(next_link)
        print(f"# PAGE {page} - [{r.status_code}] GET {next_link}")
        j = r.json()
        try:
            r.raise_for_status()
        except requests.HTTPError:
            print(j, file=sys.stderr)
            raise

        for i, indicator in enumerate(j["value"], start=1):
            indicator_id = indicator["name"]
            counter[indicator_id] += 1

            # Only log the first and last 2 indicators (for brevity)
            if indicator in j["value"][:2] + j["value"][-2:]:
                print(f"[{i}] {indicator_id} (count={counter[indicator_id]})")
            elif i == 3:
                print("...")

        page += 1
        next_link = j.get("nextLink", None)


if __name__ == "__main__":
    pagination_example()

Sentinel Instance (Small)

When I run it a Sentinel instance with 3262 indicators we can see that every page contains every single indicator. The
next link then appears to edit the $skip parameter but as shown by the responses this is clearly not respected.

foo@bar:~$ python reproduce_pages.py
# PAGE 1 - [200] GET https://management.azure.com/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&%24top=8000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=1)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=1)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=1)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=1)
# PAGE 2 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=1000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=2)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=2)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=2)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=2)
# PAGE 3 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=2000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=3)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=3)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=3)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=3)
# PAGE 4 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=3000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=4)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=4)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=4)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=4)
# PAGE 5 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=4000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=5)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=5)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=5)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=5)
# PAGE 6 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=5000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=6)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=6)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=6)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=6)
# PAGE 7 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=6000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=7)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=7)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=7)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=7)
# PAGE 8 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=7000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=8)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=8)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=8)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=8)
# PAGE 9 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=8000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=9)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=9)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=9)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=9)
# PAGE 10 - [200] GET https://management.azure.com:443/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&$top=8000&$skip=9000
[1] f313e338-7556-1044-b097-c24e2d2a9f9d (count=10)
[2] 228c1439-2ea6-0988-784d-108bed178331 (count=10)
...
[3261] 1780bb5e-6eb3-2fda-084a-7edd31f7e7cb (count=10)
[3262] 5d9bc8e3-060b-42c0-ecc6-2b04d02ed399 (count=10)
Traceback (most recent call last):
<SIGKILL> ... manually killed script because this script will run forever!

(note in page 10 that $skip is greater than $top)

Sentinel Instance (Medium)

The problem gets more damning when I attempt to run the same script in a Sentinel environment with 6919 indicators. As
the first response contains every indicator, we quickly encounter a hard limit where our responses 400 due to a memory
limit on the endpoint.

foo@bar:~$ python reproduce_pages.py
# PAGE 1 - [400] GET https://management.azure.com/subscriptions/{SENTINEL_SUBSCRIPTION_ID}/resourceGroups/{SENTINEL_RESOURCE_GROUP_NAME}/providers/Microsoft.OperationalInsights/workspaces/{SENTINEL_WORKSPACE_NAME}/providers/Microsoft.SecurityInsights/threatIntelligence/main/indicators?api-version=2024-03-01&%24top=8000
{'error': {'code': 'BadRequest', 'message': 'Response too large. Please try a lower page size.'}}
Traceback (most recent call last):

This leaves us in a scenario where we are unable to paginate at all because the $skip is completely ignored by the
endpoint as shown by #sentinel-instance-small.

Any advice would be greatly appreciated.

v-jiaodi · 2024-06-21T03:27:40Z

@ityankel Can you help take a look?

scottzach1 · 2024-06-27T20:46:12Z

Hi team, has there been any followup with this issue? Is there a better place I should escalate this instead?

scottzach1 added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Jun 11, 2024

microsoft-github-policy-service bot added question The issue doesn't require a change to the product in order to be resolved. Most issues start as that customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Jun 11, 2024

scottzach1 changed the title ~~[BUG]~~ [BUG] $top default=100 is not documented for threat-intelligence-indicators/list Jun 11, 2024

v-jiaodi assigned xuhumsft Jun 12, 2024

v-jiaodi added the Mgmt This issue points to a problem in the management-plane of the library. label Jun 12, 2024

scottzach1 changed the title ~~[BUG] $top default=100 is not documented for threat-intelligence-indicators/list~~ [BUG] Pagination appears broken for threat-intelligence-indicators/list (Sentinel) Jun 21, 2024

scottzach1 mentioned this issue Jun 25, 2024

Cannot paginate threat intelligence indicators (Sentinel) Azure/azure-sdk-for-python#36021

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Pagination appears broken for threat-intelligence-indicators/list (Sentinel) #29403

[BUG] Pagination appears broken for threat-intelligence-indicators/list (Sentinel) #29403

scottzach1 commented Jun 11, 2024

scottzach1 commented Jun 11, 2024

v-jiaodi commented Jun 12, 2024

scottzach1 commented Jun 21, 2024 •

edited

Loading

v-jiaodi commented Jun 21, 2024

scottzach1 commented Jun 27, 2024

[BUG] Pagination appears broken for threat-intelligence-indicators/list (Sentinel) #29403

[BUG] Pagination appears broken for threat-intelligence-indicators/list (Sentinel) #29403

Comments

scottzach1 commented Jun 11, 2024

API Spec link

API Spec version

Describe the bug

Expected behavior

Actual behavior

Reproduction Steps

Environment

scottzach1 commented Jun 11, 2024

v-jiaodi commented Jun 12, 2024

scottzach1 commented Jun 21, 2024 • edited Loading

Script

Sentinel Instance (Small)

Sentinel Instance (Medium)

v-jiaodi commented Jun 21, 2024

scottzach1 commented Jun 27, 2024

scottzach1 commented Jun 21, 2024 •

edited

Loading