LLM inference

UI Agent currently uses LLM to process User Prompt and strucutred Input Data relevant to this prompt. LLM selects the best UI component to visualize that data, and also configres it by selecting which values from the data has to be shown.

For now, every piece of Input Data is processed independently. In the future, we expect that conversation history will be also processed to get better UI consistency through conversation. And that all Input Data pieces will be processed at once, to also select which piece should be shown at given conversation step.

To instruct LLM to produce output expected by the agent, we use prompt engineering technique for now. LLM finetuning may be used in the future, as component type selection and configuration is relatively narrow AI task. Finetuning might help to get better results from smaller LLMs, which means better performance and lower cost of the processing.

LLM Evaluations

To evaluate how well is particular LLM performing on UI Agent component selection and configuration task, we provide Evaluation tool and dataset.

This evaluation currently covers disting shapes of the input data, and evaluates if LLM generates correct configuration from the "technical" aspect of view. Currently it is not able to evaluate if data values selected to be shown in very generic components, like one-card or table, are good enough. So you have to do this evaluation by yourself still.

Evaluation results for some LLM's are available in /results directory.

Which LLM to use?

Generally, even very small LLMs, like 3B Llama 3.2 or 2B Granite 3.2, are pretty good at this task. They mostly struggle to generate pointers to values in InputData for some shapes.

A bit larger models, like 8B Granite 3.2 or Google Geminy Flash and Geminy Flash Lite are good in the most evaluations.

Improvements on even larger LLM are not significant, and you pay all the rices for the larhe LLM - slower speed and more expensive runtime.

It seems that larger LLMs tend to put more values into generic components (more columns in the table or Facts in the card).

How is LLM really called

UI Agent core library abstracts LLM infernce over InferenceBase interface. Multiple implementation are then provided in some of the AI Framework and protocols bindings.

To get repeatable results from the agent, you should always use temperature=0 when calling the LLM behind this interface.