Are Generative AI Agents Effective Personalized Financial Advisors?

ArXiv ID: 2504.05862 “View on arXiv”

Authors: Unknown

Abstract

Large language model-based agents are becoming increasingly popular as a low-cost mechanism to provide personalized, conversational advice, and have demonstrated impressive capabilities in relatively simple scenarios, such as movie recommendations. But how do these agents perform in complex high-stakes domains, where domain expertise is essential and mistakes carry substantial risk? This paper investigates the effectiveness of LLM-advisors in the finance domain, focusing on three distinct challenges: (1) eliciting user preferences when users themselves may be unsure of their needs, (2) providing personalized guidance for diverse investment preferences, and (3) leveraging advisor personality to build relationships and foster trust. Via a lab-based user study with 64 participants, we show that LLM-advisors often match human advisor performance when eliciting preferences, although they can struggle to resolve conflicting user needs. When providing personalized advice, the LLM was able to positively influence user behavior, but demonstrated clear failure modes. Our results show that accurate preference elicitation is key, otherwise, the LLM-advisor has little impact, or can even direct the investor toward unsuitable assets. More worryingly, users appear insensitive to the quality of advice being given, or worse these can have an inverse relationship. Indeed, users reported a preference for and increased satisfaction as well as emotional trust with LLMs adopting an extroverted persona, even though those agents provided worse advice.

Keywords: Large Language Models (LLMs), Preference Elicitation, Personalized Advice, User Trust, Persona Engineering, Multi-Asset

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 6.0/10
  • Quadrant: Street Traders
  • Why: The paper focuses on experimental methodology and user behavior analysis rather than advanced mathematical modeling or quantitative trading strategies. Empirical rigor is solid with a structured lab-based user study involving 64 participants, comparing personalized vs. non-personalized advisors and assessing outcomes, though it lacks real-world backtesting or financial data processing.
  flowchart TD
    A["Research Goal:<br>How effective are LLM-advisors in high-stakes finance?"] --> B["Methodology:<br>Lab-based user study (n=64 participants)"]
    
    B --> C["Input 1:<br>Preference Elicitation<br>via conversational prompts"]
    B --> D["Input 2:<br>Personalized Advice<br>for diverse investment needs"]
    B --> E["Input 3:<br>Persona Engineering<br>(e.g., extroverted vs. neutral)"]
    
    C & D & E --> F["LLM Computational Process:<br>Conversational Agent +<br>Persona-based reasoning"]
    
    F --> G["Key Findings/Outcomes:"]
    G --> H["1. Preference elicitation matches humans<br>but struggles with conflicts"]
    G --> I["2. Can influence user behavior,<br>but prone to failure modes"]
    G --> J["3. Users prefer extroverted personas<br>despite receiving worse advice"]
    G --> K["4. Accurate elicitation is critical;<br>bad inputs lead to poor outcomes"]