ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities

Chanjin Zheng*
Shanghai Institute of Artificial Intelligence
for Education

East China Normal University, Shanghai, China
Faculty of Education
East China Normal University, Shanghai, China
chjzheng@dep.ecnu.edu.cn
Zengyi Yu*
Faculty of Education
East China Normal University, Shanghai, China
College of Education
Zhejiang University of Technology, Hangzhou, China
202105720431@zjut.edu.cn
Yilin Jiang*
College of Education
Zhejiang University of Technology, Hangzhou, China
zjut_jiangyilin@163.com
Mingzi Zhang
Faculty of Education
East China Normal University, Shanghai, China
College of Education
Zhejiang Normal University, Jinhua, China
windyday@zjnu.edu.cn
Xunuo Lu
School of Economy
Zhejiang University of Technology, Hangzhou, China
13968860822@163.com
Jing Jin
School of Education
Zhejiang Normal University, Jinhua, China
Tianchang Guanchao Primary School
Hangzhou, China
383230730@qq.com
Liteng Gao
School of Artificial Intelligence
Science and Technology
University of Shanghai for Science and Technology, Shanghai, China
2335060610@st.usst.edu.cn
Conference: CHI 2025
* Indicates Equal Contribution Corresponding Author

ArtMentor System Interface and Operation Process.

Abstract

Multimodal Large Language Models (MLLMs) face challenges in artwork evaluation, including subjective human assessments, limitations of result-oriented methods, and lack of modularity. In this paper, we propose that the design and analysis of HCI spaces, using process-oriented data, can more effectively evaluate MLLM capabilities and drive improvements. Applying this methodology, we introduce ArtMentor, a space that combines a dataset and three systems to enhance MLLM evaluations. ArtMentor documents 380 sessions with five art teachers, assessing artworks across nine critical dimensions. The modular system features entity recognition, review generation, and suggestion generation agents, enabling iterative upgrades. Process-based results analysis integrates machine learning and natural language processing to ensure reliable evaluations. Finally, we emphasize MLLM’s focus on details at the expense of the bigger picture and the superior performance of review generation compared to suggestion generation. We encourage further collaboration to cost-effectively enhance MLLM capabilities. Our contributions are available at https://artmentor.github.io.

HCI File Structure Analysis

Entities Folder

This folder contains 20 JSON files, each representing the data of an entity.


{
  "original": ["Face", "Black hair", "Open mouth", "Green shirt", "Blue shorts", "Black shoes", "Monkey", "Cat", "Dog", "Bird", "Insect", "Exclamation mark", "Yellow platform", "Books"],
  "added": ["Yellow balances", "schoolbag"],
  "removed": ["Yellow platform"],
  "style": {
    "original": ["Style: Cartoon"],
    "added": [],
    "removed": []
  }
}
        

Field Explanations:

  • original: Elements recognized in the original image
  • added: New elements added by the user
  • removed: Elements removed by the user
  • style: Style-related information, including original style, added styles, and removed styles

score_Review Folder

This folder contains 180 files, each representing scores and reviews for a photo across 9 dimensions.


[
  {
    "round": 1,
    "data": {
      "scores": {
        "original": 0,
        "current": 0,
        "initGPTscore": null
      },
      "Reviews": {
        "original": "",
        "current": "",
        "added": "",
        "removed": ""
      }
    }
  },
  {
    "round": 2,
    "data": {
      "scores": {
        "original": 4,
        "current": 4,
        "initGPTscore": 4
      },
      "Reviews": {
        "original": "The artwork effectively uses contrasting colors to enhance visual interest...",
        "current": "The artwork effectively uses contrasting colors to enhance visual interest...",
        "added": "",
        "removed": ""
      }
    }
  }
]
        

Field Explanations:

  • round: Scoring round
  • scores: Contains original score, current score, and initial GPT score
  • Reviews: Contains original review, current review, added review, and removed review

suggestion Folder

This folder contains 180 files, each representing suggestions for a photo across 9 dimensions.


[
  {
    "round": 1,
    "data": {
      "suggestions": {
        "original": "",
        "current": "",
        "added": "",
        "removed": ""
      }
    }
  },
  {
    "round": 2,
    "data": {
      "suggestions": {
        "original": "To improve the color contrast in the artwork, consider using more vibrant and varied background colors...",
        "current": "To improve the color contrast in the artwork, consider using more vibrant and varied background colors...",
        "added": "",
        "removed": ""
      }
    }
  }
]
        

Field Explanations:

  • round: Suggestion round
  • suggestions: Contains original suggestion, current suggestion, added suggestion, and removed suggestion

BibTeX


@article{zheng2025artmentor,
  title={ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities},
  author={Zheng, Chanjin and Yu, Zengyi and Jiang, Yilin and Zhang, Mingzi and Lu, Xunuo and Jin, Jing and Gao, Liteng},
  journal={arXiv preprint arXiv:2502.13832},
  year={2025}
}