What kind of visual content the pipeline detected or was told to expect.
Used to route images to the most appropriate processing tier — e.g. handwritten content triggers TrOCR, complex layouts trigger Florence-2.
What kind of visual content the pipeline detected or was told to expect.
Used to route images to the most appropriate processing tier — e.g. handwritten content triggers TrOCR, complex layouts trigger Florence-2.