Analyzing the history of CVPR

As the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025 approaches, let’s take a look at the history of the conference and its workshops from 2017 to 2024. The goal of this analysis is to provide insights into the evolution of topics and trends in artificial intelligence research over the years. Keep in mind that this information should be taken with a grain of salt, as some of the information that may be relevant to the analyses is discarded during the cleaning process. Some of the analysis is based on keywords, and we make some assumptions about how authors use keywords (e.g. it’s pretty unlikely that a paper about image data would have the keyword audio in its title or abstract), but this is not a perfect solution. The goal of this post is to give some insight into the history of the conference, not to be a definitive analysis.

Note that some of the graphs use percentiles of the total number of papers published in each year. Since there are different numbers of papers published each year, you can’t really compare the numbers from one year to the next. The goal of these graphs is to show the distribution of papers published during the period and any changes in the focus of the academic community. You can also interact with the visualizations here. You can zoom in on specific parts, enable or disable lines by clicking on their names in the legend, and hover over the points to see more information.

Overall Statistics

Here you can see the number of published papers. Each year, there are more and more papers published compared to the previous year, except for 2023. There were more than three times as many papers published in 2024 as in 2017.

{"data": [{"hovertemplate": "year=%{x}<br>papers=%{y}<extra></extra>", "legendgroup": "", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "", "orientation": "v", "showlegend": false, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "i2", "bdata": "KAQuBVQHwQd+CEgKNgmgDQ=="}, "yaxis": "y", "type": "scatter"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "papers"}}, "legend": {"tracegroupgap": 0}}}

Regarding the modalities used in the papers, we can see that image is still the most common, but the use of text and multiple modalities has increased significantly. The application of optical flow, graphs and depth information has decreased in the last years, while the use of particles has remained relatively stable.

{"data": [{"customdata": [["audio"], ["audio"], ["audio"], ["audio"], ["audio"], ["audio"], ["audio"], ["audio"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "audio", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "audio", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kN5T8dk/Uck/XsP30vhdy8DuE/g0lGn20u6D/N5tSIhffrP/WdjfrORvU/WASE4EIk+z+P78JB9PgAQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["depth"], ["depth"], ["depth"], ["depth"], ["depth"], ["depth"], ["depth"], ["depth"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "depth", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "depth", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "zB4we8DsE0Ak1egj1egTQER0/3wvhRhAQj1lr7AmFEDHDwNNB98TQMTkCmJyBRFAzGK7c/F4EUDBpv1kCWwPQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["graph"], ["graph"], ["graph"], ["graph"], ["graph"], ["graph"], ["graph"], ["graph"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "graph", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "graph", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "HX1z9M3RC0A+eSo+eSoOQHumE2xSRRFAg0lGn20uGEDucuibFmAQQL+zUd/ZqAtALkSykbMfCkCWwKb9ZAkDQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image"], ["image"], ["image"], ["image"], ["image"], ["image"], ["image"], ["image"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "eQ3lNZQXQkBUyixUyixCQAEGofVGhkBAMA4SdHz7PkB3eggk2Ro8QPeySqCiNz1AeiSI92T9O0B8Fw6ixydAQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["mesh"], ["mesh"], ["mesh"], ["mesh"], ["mesh"], ["mesh"], ["mesh"], ["mesh"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "mesh", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "mesh", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP6D/bxovaxovqP0kPVM5u4fc/kOUMnMb8+D9dS0BRmUsBQCZXEJMriPk/XoOZB+cIBUCUJbBpP1kCQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["multi modal"], ["multi modal"], ["multi modal"], ["multi modal"], ["multi modal"], ["multi modal"], ["multi modal"], ["multi modal"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "multi modal", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "multi modal", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "FbFUxFIRC0A1SIM0SIMEQFJF/XDtmQZAw1Unjyo2DEDlDfuqVnANQCdeT8ocAxhA1CNBvCfrF0DtJ0tg034jQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["optical flow"], ["optical flow"], ["optical flow"], ["optical flow"], ["optical flow"], ["optical flow"], ["optical flow"], ["optical flow"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "optical flow", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "optical flow", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "yLgg44KMA0A+eSo+eSr+Pw7LXP52jv8/98VBgo5v/z+EcWIiEo33P1osjwhNtfc/JoMsSX2t8z8Vc6szUjHvPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["particle"], ["particle"], ["particle"], ["particle"], ["particle"], ["particle"], ["particle"], ["particle"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "particle", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "particle", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPyD8TYk4TYk6zPy5/O8fHSrs/AjGEv/Me0D8j1cmZzanRPylzDHDwc9M/mwIc7fRIwD84iB7fhYPQPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["path"], ["path"], ["path"], ["path"], ["path"], ["path"], ["path"], ["path"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "path", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "path", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "NOHPhD8T7j+Y+iGY+iH4P4RTS55mNABAg0lGn20u+D+cmIhE4wX5P1osjwhNtfc/QgNjKDJb9D9GKuZWZ6T0Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["point cloud"], ["point cloud"], ["point cloud"], ["point cloud"], ["point cloud"], ["point cloud"], ["point cloud"], ["point cloud"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "point cloud", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "point cloud", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "FbFUxFIR6z8D77MC77MCQFlpwzKXvwVAw1Unjyo2DED9NCHNJ+kOQCVQ0Vs6DQtAA2MoMlvUEkCpmFudkYoIQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["text"], ["text"], ["text"], ["text"], ["text"], ["text"], ["text"], ["text"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "text", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "text", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPGEB3FO12FO0WQD/ZqivwKBlAU4Bd658oFUBUIxbeb5sUQChljgEOfhZAQgNjKDJbFEAJbNpPlsAbQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["video"], ["video"], ["video"], ["video"], ["video"], ["video"], ["video"], ["video"]], "hovertemplate": "modality=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "video", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "video", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "NpTXUF5DLkCE5g2E5g0sQMnTjAYkxidAjT7bXDDJKEDsq1rBdfQqQL+zUd/ZqCtANeR/ySBLKkDYtJ8sgc0pQA=="}, "yaxis": "y", "type": "scatter"}], "layout": {"legend": {"title": {"text": "modality"}, "tracegroupgap": 0}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}}}

It is quite common for papers to introduce new concepts, be it a new method, a new dataset, or a new architecture. The following graph shows the most common concepts introduced in the papers. Not surprisingly, algorithms are the most common concept. Algorithms also involve new methods or approaches. Novel tasks have also been introduced over the years, which is highly correlated with the creation of novel datasets. The introduction of new architectures has also increased in the last year, including new models, modules, and networks. The creation of different losses and metrics has been quite stable over the years, with very few papers introducing new ones.

{"data": [{"customdata": [["algorithms"], ["algorithms"], ["algorithms"], ["algorithms"], ["algorithms"], ["algorithms"], ["algorithms"], ["algorithms"]], "hovertemplate": "concept=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "algorithms", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "algorithms", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "qYilIpaKAEA+eSo+eSr+P1x7phNsUgVA3Y20iNzS/T9UIxbeb5v0P1osjwhNtfc/GoUB+zTk/z+P78JB9PgQQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["architectures"], ["architectures"], ["architectures"], ["architectures"], ["architectures"], ["architectures"], ["architectures"], ["architectures"]], "hovertemplate": "concept=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "architectures", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "architectures", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kN5T9fX19fX1/vP4RTS55mNPA/0PHti4ME7T+XwbYIZe3wPylzDHDwc/M//gTLG4A27z8H0eO7cBD7Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["datasets"], ["datasets"], ["datasets"], ["datasets"], ["datasets"], ["datasets"], ["datasets"], ["datasets"]], "hovertemplate": "concept=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "datasets", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "datasets", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "5+ibo2+O9j/8rMD7rMD7PyE3r0N0//w/g0lGn20u+D87/O+7niLzP4rXkzLHAP8/44SUPMuI/j9s2k+WwKb/Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["losses"], ["losses"], ["losses"], ["losses"], ["losses"], ["losses"], ["losses"], ["losses"]], "hovertemplate": "concept=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "losses", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "losses", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8dk/Uck/XcP30vhdy8DtE/aRG5pbuR1j+1v65mtH7aP76sEqjoLc0/WASE4EIkyz9LYNN+sgTGPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["metrics"], ["metrics"], ["metrics"], ["metrics"], ["metrics"], ["metrics"], ["metrics"], ["metrics"]], "hovertemplate": "concept=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "metrics", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "metrics", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAATYk4TYk6zPy5/O8fHSqs/nYHTmB/LuT/lDfuqVnDNPwAAAAAAAAAAeQPQ5pu2pT+61RmpmFu9Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["tasks"], ["tasks"], ["tasks"], ["tasks"], ["tasks"], ["tasks"], ["tasks"], ["tasks"]], "hovertemplate": "concept=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "tasks", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "tasks", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP2D8TYk4TYk7TPxTvIsAgtO4/0PHti4ME7T9sSjwAQRTmP1w6DXcvq/Q/XoOZB+cI9T9GKuZWZ6T0Pw=="}, "yaxis": "y", "type": "scatter"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}, "legend": {"title": {"text": "concept"}, "tracegroupgap": 0}}}

Regarding the common tasks in the papers, we can see a steep increase in generation tasks, especially after 2022. This may be related to the advances in large language models such as InstructGPT and ChatGPT by the end of 2022, and the release of the first collections of foundational language models such as LLaMA in early 2023. Classification, detection, estimation, and recognition have seen a decline in interest over the years, while prediction has only recently seen a decrease. Tasks such as segmentation have remained relatively stable. The use of reasoning tasks has also increased significantly in the last year, but is still a small percentage of the total number of published papers (about 3%).

{"data": [{"customdata": [["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "captioning", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "captioning", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "JUmSJEmS/D/yexnyexnyP3YLvxoT6fE/T9krrAn15D9UIxbeb5vkP76sEqjoLe0/mwIc7fRI8D/5LhxEj+/2Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["classification"], ["classification"], ["classification"], ["classification"], ["classification"], ["classification"], ["classification"], ["classification"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "classification", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "classification", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "zB4we8DsI0CmpaWlpaUhQPI0owGJcSJAlEcVG646IUCng+85dnYfQL+zUd/ZqBtAzYNzhLq/F0DMrc5IxVwWQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["clustering"], ["clustering"], ["clustering"], ["clustering"], ["clustering"], ["clustering"], ["clustering"], ["clustering"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "clustering", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "clustering", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "qYilIpaKAED8rMD7rMD7P4RTS55mNABAIrd0N9IiAkDRNy3AMI8AQF1Ii+URoQFA/gTLG4A2/z9VzK3OSMX4Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["counting"], ["counting"], ["counting"], ["counting"], ["counting"], ["counting"], ["counting"], ["counting"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "counting", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "counting", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kN5T9WLrhVLrjlP1x7phNsUvU/0PHti4ME7T9sSjwAQRTmP44BDn5u4uU/6QOqY29t2D8Vc6szUjHfPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["detection"], ["detection"], ["detection"], ["detection"], ["detection"], ["detection"], ["detection"], ["detection"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "detection", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "detection", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "DDIuyLggKkCIh4eHh4cnQMOvxkR6oChAFWDX+idKKUCJhffbJuUlQNuvLqTF8iZAA2MoMlvUIkB/sgQ27WciQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["estimation"], ["estimation"], ["estimation"], ["estimation"], ["estimation"], ["estimation"], ["estimation"], ["estimation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "estimation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "estimation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "vzn65uibIkA/Uo0+Uo0iQBycWvI0ox1AEnR8++IgIUAfhHFiIhIdQCZXEJMriBlAPIRNAY52GkD7yRLYtJ8XQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["forecasting"], ["forecasting"], ["forecasting"], ["forecasting"], ["forecasting"], ["forecasting"], ["forecasting"], ["forecasting"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "forecasting", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "forecasting", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP2D8TYk4TYk7TP0kPVM5u4dc/g0lGn20u6D87/O+7niLjP/BzE68nZe4/0wKJq16k4T8Cm/aTJbDpPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["generation"], ["generation"], ["generation"], ["generation"], ["generation"], ["generation"], ["generation"], ["generation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "generation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "generation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "+ubom6NvGECEv3CEv3AgQHumE2xSRSFAzgWTjD7bJEBNhbbHUBcnQPSPD4zsUChAtRpffs7rMECuzkjF3Mo2QA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["identification"], ["identification"], ["identification"], ["identification"], ["identification"], ["identification"], ["identification"], ["identification"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "identification", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "identification", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "wOwBswfMAkAMIFsMIFsMQDWjAYlxcApAQj1lr7AmBEA7/O+7niIDQPKBkR0KW/s/WASE4EIk6z9LYNN+sgT2Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["navigation"], ["navigation"], ["navigation"], ["navigation"], ["navigation"], ["navigation"], ["navigation"], ["navigation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "navigation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "navigation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP6D93FO12FO32P0kPVM5u4fc/trlgktFn6z/HDwNNB9/zPylzDHDwc/M/7oK/ihNS8j+ix3fhIHr2Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["prediction"], ["prediction"], ["prediction"], ["prediction"], ["prediction"], ["prediction"], ["prediction"], ["prediction"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "prediction", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "prediction", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "tbrT6k6rIUAdbFgdbFghQOBRwiz2ySRA6+RRxYarJkA7/O+7niIjQEFMriAmVyZA76N3m9yYKEDDQfT4LpwjQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["reasoning"], ["reasoning"], ["reasoning"], ["reasoning"], ["reasoning"], ["reasoning"], ["reasoning"], ["reasoning"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "reasoning", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "reasoning", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kNBUAk1egj1egDQFJF/XDtmQZAfPviIEHHB0DWDv/7rqcIQMPWjPOPDwRAJoMsSX2tA0AJbNpPlsALQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["recognition"], ["recognition"], ["recognition"], ["recognition"], ["recognition"], ["recognition"], ["recognition"], ["recognition"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "recognition", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "recognition", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kNJUCmpaWlpaUhQCbSA5WzWxxAumCS0WebG0D4XU+RqdAWQPSPD4zsUBhAQgNjKDJbFEBlCWzaTxYRQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["regression"], ["regression"], ["regression"], ["regression"], ["regression"], ["regression"], ["regression"], ["regression"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "regression", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "regression", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "LBWxVMRSDUAk1egj1egDQHDn+FhpwwJAKQXYtf6JAkAj1cmZzakBQPakzDHAwQNAQgNjKDJb9D+n/WQJbNr3Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["retrieval"], ["retrieval"], ["retrieval"], ["retrieval"], ["retrieval"], ["retrieval"], ["retrieval"], ["retrieval"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "retrieval", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "retrieval", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "DeU1lNdQCkBWLrhVLrgFQFJF/XDtmQZAL1M7NCvxAkAQhXWzekkIQFszzj8+MAZASsTocGjNCkDyXTiIHt8EQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["segmentation"], ["segmentation"], ["segmentation"], ["segmentation"], ["segmentation"], ["segmentation"], ["segmentation"], ["segmentation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "segmentation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "segmentation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "MHvA7AGzHUDRleTQleQgQAOPEmaxTyBAgF3rnygFIECrl4Tzis4dQF5PyhwDHCBAmwIc7fRIIEDPSMXc6swgQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["tracking"], ["tracking"], ["tracking"], ["tracking"], ["tracking"], ["tracking"], ["tracking"], ["tracking"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "tracking", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "tracking", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "0IQ/E/5MFEBJXJdIXJcQQChbdQUeJQxAtrlgktFnC0DucuibFmAQQCVQ0Vs6DQtAQgNjKDJbBEAFNu0nS2AKQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["translation"], ["translation"], ["translation"], ["translation"], ["translation"], ["translation"], ["translation"], ["translation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "translation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "translation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "FbFUxFIR6z8TYk4TYk4DQFJF/XDtmQZAPO8BMYS/A0BUIxbeb5sEQJEWyyNCUwFAlYMGxlBk9j+ZW52RirnzPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["verification"], ["verification"], ["verification"], ["verification"], ["verification"], ["verification"], ["verification"], ["verification"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "verification", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "verification", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "FbFUxFIR6z8dk/Uck/XsPxTvIsAgtO4/g0lGn20u6D+EcWIiEo3XP/SPD4zsUNg/eQPQ5pu21T84iB7fhYPQPw=="}, "yaxis": "y", "type": "scatter"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}, "legend": {"title": {"text": "task"}, "tracegroupgap": 0}}}

Let’s dive a little deeper into the tasks.

Algorithms focused on security and privacy have been around for a while, but the number of papers published on them has increased significantly in the last year. Spoofing detection is crucial for applications such as identity recognition, where attackers may try to use photos or videos to impersonate someone else, and has seemed to gain urgency since deepfake technologies have become more prevalent.

{"data": [{"customdata": [["adversarial attack"], ["adversarial attack"], ["adversarial attack"], ["adversarial attack"], ["adversarial attack"], ["adversarial attack"], ["adversarial attack"], ["adversarial attack"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "adversarial attack", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "adversarial attack", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8dk/Uck/XcP1ZX4FHCLPY/D81KvEztAEBUIxbeb5v0P8HIDoWtGfc/BITgQiQb+T/5LhxEj+/2Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["anomaly detection"], ["anomaly detection"], ["anomaly detection"], ["anomaly detection"], ["anomaly detection"], ["anomaly detection"], ["anomaly detection"], ["anomaly detection"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "anomaly detection", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "anomaly detection", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPyD8TYk4TYk7jP3Dn+Fhpw+I/aRG5pbuR5j8j1cmZzanhPylzDHDwc+M/mwIc7fRI4D8Vc6szUjHvPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["disambiguation"], ["disambiguation"], ["disambiguation"], ["disambiguation"], ["disambiguation"], ["disambiguation"], ["disambiguation"], ["disambiguation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "disambiguation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "disambiguation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8AAAAAAAAAAC5/O8fHSrs/AAAAAAAAAACEcWIiEo2nPylzDHDwc6M/mwIc7fRIwD+UJbBpP1nCPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["face verification"], ["face verification"], ["face verification"], ["face verification"], ["face verification"], ["face verification"], ["face verification"], ["face verification"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "face verification", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "face verification", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP2D9WLrhVLrjlPy5/O8fHSts/NqGesldYwz+EcWIiEo23P76sEqjoLb0/eQPQ5pu2pT+61RmpmFudPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["fact checking"], ["fact checking"], ["fact checking"], ["fact checking"], ["fact checking"], ["fact checking"], ["fact checking"], ["fact checking"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "fact checking", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "fact checking", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAClzDHDwc6M/AAAAAAAAAAAAAAAAAAAAAA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["forensics"], ["forensics"], ["forensics"], ["forensics"], ["forensics"], ["forensics"], ["forensics"], ["forensics"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "forensics", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "forensics", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "FbFUxFIR6z+Y+iGY+iHYP2OfbNUVeOQ/T9krrAn15D87/O+7niLjP/SPD4zsUNg/WASE4EIkyz8Cm/aTJbDZPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["fraud detection"], ["fraud detection"], ["fraud detection"], ["fraud detection"], ["fraud detection"], ["fraud detection"], ["fraud detection"], ["fraud detection"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "fraud detection", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "fraud detection", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnYHTmB/LqT8AAAAAAAAAAClzDHDwc7M/eQPQ5pu2pT+61RmpmFu9Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["privacy"], ["privacy"], ["privacy"], ["privacy"], ["privacy"], ["privacy"], ["privacy"], ["privacy"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "privacy", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "privacy", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP6D8dk/Uck/XcPxTvIsAgtO4/0PHti4ME7T+XwbYIZe3wP8HIDoWtGfc/IAQXItnI+T+waT9ZApv6Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["safety"], ["safety"], ["safety"], ["safety"], ["safety"], ["safety"], ["safety"], ["safety"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "safety", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "safety", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kN9T8TYk4TYk7zP3Dn+FhpwwJAaRG5pbuR9j+cmIhE4wX5P11Ii+URoQFAjwTxnqx//D/yXTiIHt8EQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["spamming"], ["spamming"], ["spamming"], ["spamming"], ["spamming"], ["spamming"], ["spamming"], ["spamming"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "spamming", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "spamming", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAClzDHDwc6M/AAAAAAAAAAC61RmpmFudPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["spoofing"], ["spoofing"], ["spoofing"], ["spoofing"], ["spoofing"], ["spoofing"], ["spoofing"], ["spoofing"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "spoofing", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "spoofing", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8TYk4TYk7TP30vhdy8DuE/nYHTmB/L2T8j1cmZzanBP/SPD4zsUMg/eQPQ5pu2xT84iB7fhYPgPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["spotting"], ["spotting"], ["spotting"], ["spotting"], ["spotting"], ["spotting"], ["spotting"], ["spotting"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "spotting", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "spotting", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP2D8dk/Uck/XcP2OfbNUVeMQ/0PHti4ME3T8j1cmZzanRP/SPD4zsUMg/eQPQ5pu2xT+UJbBpP1nCPw=="}, "yaxis": "y", "type": "scatter"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}, "legend": {"title": {"text": "task"}, "tracegroupgap": 0}}}

Explainability and interpretability has gained traction in the last few years, with a significant increase in the number of papers published on the topic around 2019, following a surge in some specific conferences and workshops on model transparency, interpretability, and fairness, such as ACM FaccT and VISxAI. Explainability is crucial for building trust in AI systems and ensuring that they make decisions based on valid reasoning. One of the areas that has seen the most investment in recent years is model grounding, the process of tying the model’s predictions to specific features in the input data. This is particularly important in applications such as image classification and question answering, where it is essential to understand which parts of an input (text, image) are driving the model’s predictions.

{"data": [{"customdata": [["explainability"], ["explainability"], ["explainability"], ["explainability"], ["explainability"], ["explainability"], ["explainability"], ["explainability"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "explainability", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "explainability", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8TYk4TYk6zPy5/O8fHSts/AjGEv/Me0D8LrqN3/DDgP76sEqjoLd0/CgP2acj/0j+61RmpmFvdPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["grounding"], ["grounding"], ["grounding"], ["grounding"], ["grounding"], ["grounding"], ["grounding"], ["grounding"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "grounding", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "grounding", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL0j9WLrhVLrjlP0kPVM5u4ec/NqGesldY4z+1v65mtH7qP11Ii+URofE/lYMGxlBk9j9ZApv2kyX6Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["interpretability"], ["interpretability"], ["interpretability"], ["interpretability"], ["interpretability"], ["interpretability"], ["interpretability"], ["interpretability"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "interpretability", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "interpretability", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL4j+Y+iGY+iHYP3Dn+Fhpw/I/KQXYtf6J8j+XwbYIZe3wPyM7FLZmnO8/t4JSzKn28D9C9PguHETzPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["traceability"], ["traceability"], ["traceability"], ["traceability"], ["traceability"], ["traceability"], ["traceability"], ["traceability"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "traceability", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "traceability", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC61RmpmFudPw=="}, "yaxis": "y", "type": "scatter"}], "layout": {"legend": {"title": {"text": "word"}, "tracegroupgap": 0}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}}}

Visual tasks such as image denoising have received a lot of attention in recent years, with many papers published on the topic. This may be due to the increasing importance of image quality in computer vision applications, the development of new techniques to improve image quality, and the increased capacity of visual models to handle larger inputs. This category of tasks also includes deblurring, dehazing, demoireing, deraining, and others. Image processing and image generation tasks have also increased significantly.

{"data": [{"customdata": [["colorization"], ["colorization"], ["colorization"], ["colorization"], ["colorization"], ["colorization"], ["colorization"], ["colorization"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "colorization", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "colorization", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "NOHPhD8T3j8dk/Uck/XMPy5/O8fHSts/AjGEv/Me0D+EcWIiEo23P/SPD4zsUMg/eQPQ5pu2tT+UJbBpP1nCPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["denoising"], ["denoising"], ["denoising"], ["denoising"], ["denoising"], ["denoising"], ["denoising"], ["denoising"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "denoising", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "denoising", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "5+ibo2+OBkAdk/Uck/UMQB4lzGKfbA1A6il7hTWhDkAHXUtAUZkLQPSPD4zsUAhAPIRNAY52CkB1RirmVucVQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["editing"], ["editing"], ["editing"], ["editing"], ["editing"], ["editing"], ["editing"], ["editing"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "editing", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "editing", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP6D/bxovaxovqP3Dn+Fhpw+I/Qj1lr7Am9D/HDwNNB9/zP4vlEaGp9vs/zYNzhLq/B0CNVMytzkgQQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image enhancement"], ["image enhancement"], ["image enhancement"], ["image enhancement"], ["image enhancement"], ["image enhancement"], ["image enhancement"], ["image enhancement"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image enhancement", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image enhancement", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAACY+iGY+iHYPy5/O8fHSss/nYHTmB/L2T+1v65mtH7aP/SPD4zsUNg/CgP2acj/0j8Vc6szUjHfPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image filling"], ["image filling"], ["image filling"], ["image filling"], ["image filling"], ["image filling"], ["image filling"], ["image filling"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image filling", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image filling", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "JUmSJEmS/D/8rMD7rMD7P4BBaL2RoQBAnYHTmB/LCUD4XU+RqdAGQMDBz028nghAQgNjKDJbBEC1nyyBTfsLQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image generation"], ["image generation"], ["image generation"], ["image generation"], ["image generation"], ["image generation"], ["image generation"], ["image generation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image generation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image generation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kN5T8dk/Uck/X8Py5/O8fHSvs/HGkRuaW7AUALrqN3/DAAQIrXkzLHAP8/lYMGxlBkBkC+CwfR47sOQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image retrieval"], ["image retrieval"], ["image retrieval"], ["image retrieval"], ["image retrieval"], ["image retrieval"], ["image retrieval"], ["image retrieval"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image retrieval", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image retrieval", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL8j9fX19fX1/vP2OfbNUVePQ/XHXyqGLD9T+XwbYIZe3wP8TkCmJyBeE/IAQXItnI6T+UJbBpP1niPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image segmentation"], ["image segmentation"], ["image segmentation"], ["image segmentation"], ["image segmentation"], ["image segmentation"], ["image segmentation"], ["image segmentation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image segmentation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image segmentation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "NOHPhD8T7j81SIM0SIP0P1x7phNsUvU/trlgktFn6z+cmIhE4wXpP/erC2mxPPI/CgP2acj/8j9VzK3OSMX4Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image to image"], ["image to image"], ["image to image"], ["image to image"], ["image to image"], ["image to image"], ["image to image"], ["image to image"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image to image", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image to image", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP2D93FO12FO32P4RTS55mNABAT9krrAn19D8QhXWzekn4P4vlEaGp9us/CgP2acj/4j+n/WQJbNrnPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["localization"], ["localization"], ["localization"], ["localization"], ["localization"], ["localization"], ["localization"], ["localization"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "localization", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "localization", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kNFUAD77MC77MSQChbdQUeJQxAaRG5pbuRBkBxIQ48vywOQL2l03D3sg5ASsTocGjNCkC3OiMVc6sMQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["matching"], ["matching"], ["matching"], ["matching"], ["matching"], ["matching"], ["matching"], ["matching"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "matching", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "matching", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "Cn8m/JnwGUCcm5ubm5sTQHJwasnTjBJASYvILd2NFEDbIpS1w/8WQBCM7FDYmhNAV+PLz3ndFEA/WQKb9pMSQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["odometry"], ["odometry"], ["odometry"], ["odometry"], ["odometry"], ["odometry"], ["odometry"], ["odometry"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "odometry", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "odometry", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPyD+Y+iGY+iHYP2OfbNUVeNQ/0PHti4ME3T9UIxbeb5vUP/SPD4zsUMg/WASE4EIkyz+UJbBpP1nCPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["quality assessment"], ["quality assessment"], ["quality assessment"], ["quality assessment"], ["quality assessment"], ["quality assessment"], ["quality assessment"], ["quality assessment"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "quality assessment", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "quality assessment", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPyD8TYk4TYk7TPy5/O8fHSss/nYHTmB/LyT9UIxbeb5vUP44BDn5u4tU/mwIc7fRI0D84iB7fhYPgPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["reconstruction"], ["reconstruction"], ["reconstruction"], ["reconstruction"], ["reconstruction"], ["reconstruction"], ["reconstruction"], ["reconstruction"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "reconstruction", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "reconstruction", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "vYbyGsprEkDv2p/u2p8WQGOfbNUVeBRAgKIUYNf6F0DN5tSIhfcbQI3zjw+M7BhA3XI9f4LlIUCUJbBpP9keQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["removal"], ["removal"], ["removal"], ["removal"], ["removal"], ["removal"], ["removal"], ["removal"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "removal", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "removal", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL8j9WLrhVLrj1P0kPVM5u4fc/T9krrAn19D/HDwNNB9/zPyM7FLZmnO8/xwReXRbb7T/mVmekYm7xPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["style transfer"], ["style transfer"], ["style transfer"], ["style transfer"], ["style transfer"], ["style transfer"], ["style transfer"], ["style transfer"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "style transfer", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "style transfer", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL4j9fX19fX1/vPxTvIsAgtO4/HGkRuaW74T/HDwNNB9/zPylzDHDwc+M/xwReXRbb3T/mVmekYm7hPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["super resolution"], ["super resolution"], ["super resolution"], ["super resolution"], ["super resolution"], ["super resolution"], ["super resolution"], ["super resolution"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "super resolution", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "super resolution", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kNBUDrOSbrOSYLQEX9cO2ZTghAsGv9E6UAC0C0/HHkSr4QQFkeEZpqvwpAQgNjKDJbBECwaT9ZApsKQA=="}, "yaxis": "y", "type": "scatter"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}, "legend": {"title": {"text": "task"}, "tracegroupgap": 0}}}

Language tasks have also seen a fluctuation in the number of papers published over the past few years, particularly those that focus on dialogue and conversation. By using a conversational interface, users can interact with AI systems in a more natural and intuitive way, leading to better user experiences and more effective communication. This has led to a surge in research on dialog systems, including chatbots, virtual assistants, and other conversational agents. The development of large-scale language models has also played a significant role in this trend, as these models have demonstrated impressive capabilities in generating human-like text and understanding context.

{"data": [{"customdata": [["dialog"], ["dialog"], ["dialog"], ["dialog"], ["dialog"], ["dialog"], ["dialog"], ["dialog"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "dialog", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "dialog", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP2D/RleTQleTgPxTvIsAgtN4/nYHTmB/L2T/lDfuqVnDNPylzDHDwc8M/mwIc7fRIwD+dkYq51RnlPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["language translation"], ["language translation"], ["language translation"], ["language translation"], ["language translation"], ["language translation"], ["language translation"], ["language translation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "language translation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "language translation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAATYk4TYk6zPwAAAAAAAAAAnYHTmB/LqT+EcWIiEo2nPylzDHDwc7M/eQPQ5pu2pT+61RmpmFudPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["question answering"], ["question answering"], ["question answering"], ["question answering"], ["question answering"], ["question answering"], ["question answering"], ["question answering"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "question answering", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "question answering", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "FbFUxFIR6z8TYk4TYk7zP3Dn+Fhpw+I/nYHTmB/L2T+EcWIiEo3XP/SPD4zsUNg/CgP2acj/4j9C9PguHETjPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["summarization"], ["summarization"], ["summarization"], ["summarization"], ["summarization"], ["summarization"], ["summarization"], ["summarization"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "summarization", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "summarization", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP6D+Y+iGY+iHYP2OfbNUVeMQ/nYHTmB/LuT+EcWIiEo23PylzDHDwc8M/eQPQ5pu2tT+UJbBpP1nCPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["text generation"], ["text generation"], ["text generation"], ["text generation"], ["text generation"], ["text generation"], ["text generation"], ["text generation"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "text generation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "text generation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACEcWIiEo2nPwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=="}, "yaxis": "y", "type": "scatter"}], "layout": {"legend": {"title": {"text": "task"}, "tracegroupgap": 0}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}}}

Multimodal tasks are one of the current trends in artificial intelligence. These tasks involve the combination of different modalities, such as audio, text, and images, to improve the performance of models and to solve problems that require a deeper understanding of the intermodality of the world. The number of papers published on these tasks has increased significantly in recent years, with a particular focus on tasks such as image-text alignment, image synthesis, video synthesis, and visual question answering. This trend is likely to continue as researchers explore new ways to combine different modalities in novel ways and improve the performance of models.

{"data": [{"customdata": [["alignment"], ["alignment"], ["alignment"], ["alignment"], ["alignment"], ["alignment"], ["alignment"], ["alignment"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "alignment", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "alignment", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "5+ibo2+OBkAD77MC77MCQGCNifRA5QRAiZepHZqVCEAo6V5T4gEQQHYobM04/xJAV+PLz3ndFECuzkjF3OoZQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["audio synthesis"], ["audio synthesis"], ["audio synthesis"], ["audio synthesis"], ["audio synthesis"], ["audio synthesis"], ["audio synthesis"], ["audio synthesis"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "audio synthesis", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "audio synthesis", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAClzDHDwc7M/eQPQ5pu2tT+61RmpmFudPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"], ["captioning"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "captioning", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "captioning", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "JUmSJEmS/D/yexnyexnyP3YLvxoT6fE/T9krrAn15D9UIxbeb5vkP76sEqjoLe0/mwIc7fRI8D/5LhxEj+/2Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["image synthesis"], ["image synthesis"], ["image synthesis"], ["image synthesis"], ["image synthesis"], ["image synthesis"], ["image synthesis"], ["image synthesis"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "image synthesis", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "image synthesis", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kN5T8dk/Uck/X8Py5/O8fHSvs/HGkRuaW7AUALrqN3/DAAQIrXkzLHAP8/lYMGxlBkBkC+CwfR47sOQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["referring expression comprehension"], ["referring expression comprehension"], ["referring expression comprehension"], ["referring expression comprehension"], ["referring expression comprehension"], ["referring expression comprehension"], ["referring expression comprehension"], ["referring expression comprehension"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "referring expression comprehension", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "referring expression comprehension", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8TYk4TYk7DPy5/O8fHSrs/NqGesldYwz+EcWIiEo23PylzDHDwc6M/CgP2acj/0j+61RmpmFu9Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["video question answering"], ["video question answering"], ["video question answering"], ["video question answering"], ["video question answering"], ["video question answering"], ["video question answering"], ["video question answering"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "video question answering", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "video question answering", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAATYk4TYk6zPy5/O8fHSqs/NqGesldYwz+EcWIiEo3HP/SPD4zsUMg/mwIc7fRI4D9LYNN+sgTWPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["video synthesis"], ["video synthesis"], ["video synthesis"], ["video synthesis"], ["video synthesis"], ["video synthesis"], ["video synthesis"], ["video synthesis"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "video synthesis", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "video synthesis", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAATYk4TYk7TP0kPVM5u4dc/NqGesldY0z8j1cmZzanRP/SPD4zsUNg/xwReXRbb3T+P78JB9PjwPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["visual grounding"], ["visual grounding"], ["visual grounding"], ["visual grounding"], ["visual grounding"], ["visual grounding"], ["visual grounding"], ["visual grounding"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "visual grounding", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "visual grounding", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8dk/Uck/XMPy5/O8fHSrs/nYHTmB/LuT8j1cmZzanRPylzDHDwc9M/CgP2acj/0j9eOIge34XbPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["visual question answering"], ["visual question answering"], ["visual question answering"], ["visual question answering"], ["visual question answering"], ["visual question answering"], ["visual question answering"], ["visual question answering"]], "hovertemplate": "task=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "visual question answering", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "visual question answering", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "yLgg44KM8z8+eSo+eSr+P1x7phNsUvU/D81KvEzt8D8LrqN3/DDwP76sEqjoLe0/CgP2acj/8j9QlsCm/WT3Pw=="}, "yaxis": "y", "type": "scatter"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}, "legend": {"title": {"text": "task"}, "tracegroupgap": 0}}}

Here we focus on analyzing the use of some keywords in the LLM papers. More specifically:

  • Chain-of-Thought, Tree-of-Thought, and any of-Thought variations - these are prompting techniques that help the model to break down complex tasks into smaller, more manageable steps, allowing it to reason through the problem more effectively;
  • Agent - refers to the use of LLMs as agents that can perform tasks autonomously, often in conjunction with other tools or systems;
  • Distillation - a technique used to compress large models into smaller, more efficient ones while retaining their performance;
  • Few-shot prompting - a prompting technique that provides the model with a few examples of the task at hand, allowing it to generalize and perform well on similar tasks;
  • Fine-tuning - the process of training a pre-trained model on a specific task or dataset to improve its performance;
  • Reinforcement Learning (RL) - a type of machine learning where an agent learns to make decisions by receiving feedback from its environment in the form of rewards or penalties;
  • Retrieval Augmented Generation (RAG) - a technique that combines retrieval-based methods with generative models to improve the performance of language models on specific tasks;
  • Self-Instruct - a technique that allows models to learn from their own outputs, improving their performance over time;
  • Tokenizer - a component of language models that converts text into a format that the model can understand, often by breaking it down into smaller units called tokens;
  • Tool - refers to the use of external tools or systems in conjunction with LLMs to perform tasks more effectively;
  • Zero-shot prompting - a prompting technique that allows the model to perform tasks without any prior examples or training on that specific task.

Few-shot and zero-shot prompting have lost the interest of the academic community in favor of RAG, thought processes, and novel fine-tuning techniques. Interest in creating LLM agents that can tackle harder tasks and use tools is one of the hottest topics in the field.

{"data": [{"customdata": [["* of thought"], ["* of thought"], ["* of thought"], ["* of thought"], ["* of thought"], ["* of thought"], ["* of thought"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "* of thought", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "* of thought", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAfDQhL2wVBEA="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["agent"], ["agent"], ["agent"], ["agent"], ["agent"], ["agent"], ["agent"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "agent", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "agent", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABbUZmK5zHEA="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["distillation"], ["distillation"], ["distillation"], ["distillation"], ["distillation"], ["distillation"], ["distillation"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "distillation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "distillation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABJQI7jOI7jOBZAus6xRiIgDkA="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["few shot"], ["few shot"], ["few shot"], ["few shot"], ["few shot"], ["few shot"], ["few shot"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "few shot", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "few shot", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAKuqqqqqqjBAMU653d/BJUA="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["finetuning"], ["finetuning"], ["finetuning"], ["finetuning"], ["finetuning"], ["finetuning"], ["finetuning"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "finetuning", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "finetuning", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAI7jOI7jOCZAus6xRiIgLkA="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["reinforcement learning"], ["reinforcement learning"], ["reinforcement learning"], ["reinforcement learning"], ["reinforcement learning"], ["reinforcement learning"], ["reinforcement learning"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "reinforcement learning", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "reinforcement learning", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAASUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAKuqqqqqqjBA3P0dXPaGQEA="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["retrieval augmented generation"], ["retrieval augmented generation"], ["retrieval augmented generation"], ["retrieval augmented generation"], ["retrieval augmented generation"], ["retrieval augmented generation"], ["retrieval augmented generation"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "retrieval augmented generation", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "retrieval augmented generation", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAOUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABJQKuqqqqqqkBAayQC4qMJQ0A="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["tokenizer"], ["tokenizer"], ["tokenizer"], ["tokenizer"], ["tokenizer"], ["tokenizer"], ["tokenizer"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "tokenizer", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "tokenizer", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAfDQhL2wV9D8="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["tool"], ["tool"], ["tool"], ["tool"], ["tool"], ["tool"], ["tool"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "tool", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "tool", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAOUAAAAAAAAAAAAAAAAAAAElAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMU653d/BFUA="}, "yaxis": "y", "type": "scatter"}, {"customdata": [["zero shot"], ["zero shot"], ["zero shot"], ["zero shot"], ["zero shot"], ["zero shot"], ["zero shot"]], "hovertemplate": "word=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "zero shot", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "zero shot", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+QH5QfmB+cH6Ac="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABJQBzHcRzHcUNAoifVVzI/M0A="}, "yaxis": "y", "type": "scatter"}], "layout": {"legend": {"title": {"text": "word"}, "tracegroupgap": 0}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}}}

Information about Authors

Now let’s look at the authors of the papers. This first graph shows the number of papers published by each author. As we can see, most authors have published only one paper at the conference. Out of 33,861 authors, only 1,308 have 10 or more accepted papers.

{"data": [{"hovertemplate": "Number of papers=%{text}<br>Authors=%{y}<extra></extra>", "legendgroup": "", "marker": {"pattern": {"shape": ""}}, "name": "", "orientation": "v", "showlegend": false, "text": {"dtype": "f8", "bdata": "AAAAAADAYEAAAAAAAMBdQAAAAAAAAFlAAAAAAACAVUAAAAAAAMBUQAAAAAAAAFRAAAAAAADAU0AAAAAAAMBSQAAAAAAAwFFAAAAAAACAUUAAAAAAAEBRQAAAAAAAgFBAAAAAAABAUEAAAAAAAABQQAAAAAAAgE9AAAAAAAAAT0AAAAAAAIBOQAAAAAAAgE1AAAAAAAAATUAAAAAAAABMQAAAAAAAgEtAAAAAAAAAS0AAAAAAAIBKQAAAAAAAAEpAAAAAAACASUAAAAAAAABJQAAAAAAAgEhAAAAAAAAASEAAAAAAAIBHQAAAAAAAAEdAAAAAAAAARkAAAAAAAIBFQAAAAAAAAEVAAAAAAACAREAAAAAAAABEQAAAAAAAgENAAAAAAAAAQ0AAAAAAAIBCQAAAAAAAAEJAAAAAAACAQUAAAAAAAABBQAAAAAAAgEBAAAAAAAAAQEAAAAAAAAA/QAAAAAAAAD5AAAAAAAAAPUAAAAAAAAA8QAAAAAAAADtAAAAAAAAAOkAAAAAAAAA5QAAAAAAAADhAAAAAAAAAN0AAAAAAAAA2QAAAAAAAADVAAAAAAAAANEAAAAAAAAAzQAAAAAAAADJAAAAAAAAAMUAAAAAAAAAwQAAAAAAAAC5AAAAAAAAALEAAAAAAAAAqQAAAAAAAAChAAAAAAAAAJkAAAAAAAAAkQAAAAAAAACJAAAAAAAAAIEAAAAAAAAAcQAAAAAAAABhAAAAAAAAAFEAAAAAAAAAQQAAAAAAAAAhAAAAAAAAAAEAAAAAAAADwPw=="}, "textposition": "auto", "x": {"dtype": "i2", "bdata": "hgB3AGQAVgBTAFAATwBLAEcARgBFAEIAQQBAAD8APgA9ADsAOgA4ADcANgA1ADQAMwAyADEAMAAvAC4ALAArACoAKQAoACcAJgAlACQAIwAiACEAIAAfAB4AHQAcABsAGgAZABgAFwAWABUAFAATABIAEQAQAA8ADgANAAwACwAKAAkACAAHAAYABQAEAAMAAgABAA=="}, "xaxis": "x", "y": {"dtype": "i2", "bdata": "AQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAIAAQABAAEABQAEAAIAAQABAAMAAQACAAIABAABAAIABAAEAAQAAgAHAAgACwAGAAsABAAFAAYAFAALABIACwASABEAEgAVAB4AGAAWACQAIgAyACMALABAAFEAWABrAHsAlgCzAAIBKgGeAUMCSQOXBW8J3RU8UQ=="}, "yaxis": "y", "type": "bar"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "Number of papers"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "Authors"}}, "legend": {"tracegroupgap": 0}, "barmode": "relative"}}

Here are the top 10 authors with the most papers:

Author Papers
Luc Van Gool 134
Radu Timofte 119
Lei Zhang 100
Yi Yang 86
Yu Qiao 83
Dacheng Tao 80
Ming-Hsuan Yang 79
Qi Tian 75
Marc Pollefeys 71
Xiaogang Wang 70

Now let’s look at the number of authors per paper. Most of the papers have between 2 and 7 authors, but there are a few with a large number of authors, such as Why Is the Winner the Best?, which has 125 authors, and The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report, with a staggering 134 authors. The former is a multi-center study of all 80 competitions held as part of IEEE ISBI 2021 and MICCAI 2021, while the latter is a report summarizing the results of the NTIRE 2024 challenge, a competition held at the CVPR conference.

{"data": [{"hovertemplate": "Number of authors=%{x}<br>Number of papers=%{text}<extra></extra>", "legendgroup": "", "marker": {"pattern": {"shape": ""}}, "name": "", "orientation": "v", "showlegend": false, "text": {"dtype": "f8", "bdata": "AAAAAAAgZ0AAAAAAAByWQAAAAAAAoqZAAAAAAAAArEAAAAAAAE6pQAAAAAAA2KJAAAAAAADklkAAAAAAALiIQAAAAAAA0HhAAAAAAADAaUAAAAAAAIBcQAAAAAAAAFBAAAAAAAAAOUAAAAAAAAA3QAAAAAAAACRAAAAAAAAAJkAAAAAAAAAQQAAAAAAAAABAAAAAAAAA8D8AAAAAAAAQQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAQQAAAAAAAAAhAAAAAAAAA8D8AAAAAAAAIQAAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAAAEAAAAAAAAAAQAAAAAAAAPA/AAAAAAAA8D8AAAAAAAAAQAAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAACEAAAAAAAADwPwAAAAAAAPA/AAAAAAAACEAAAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8="}, "textposition": "auto", "x": {"dtype": "i2", "bdata": "AQACAAMABAAFAAYABwAIAAkACgALAAwADQAOAA8AEAARABIAEwAUABUAFgAXABgAGwAcAB8AIQAiACMAJAAlACcAKQAqACsALQAwADEANQA3ADgAOgBCAEMARABOAE8AVQBYAF0AZABlAGwAcQBzAH0AhgA="}, "xaxis": "x", "y": {"dtype": "i2", "bdata": "uQCHBVELAA6nDGwJuQUXA40BzgByAEAAGQAXAAoACwAEAAIAAQAEAAIAAgAEAAMAAQADAAEAAQABAAEAAgACAAEAAQACAAEAAQABAAEAAQABAAEAAwABAAEAAwABAAEAAQABAAEAAQABAAEAAQABAAEAAQA="}, "yaxis": "y", "type": "bar"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "Number of authors"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "Number of papers"}}, "legend": {"tracegroupgap": 0}, "barmode": "relative"}}

Since most papers have multiple authors, it is quite common to see some authors constantly collaborating with each other. The most common pair of authors is Jiwen Lu and Jie Zhou, who have collaborated on 57 papers together. The second most common pair is Luc Van Gool and Radu Timofte with 43 papers together, followed by Tao Xiang and Yi-Zhe Song with 38 papers. The top 10 most frequent pairs of authors are:

Author 1 Author 2 Papers
Jiwen Lu Jie Zhou 57
Luc Van Gool Radu Timofte 43
Tao Xiang Yi-Zhe Song 38
Fahad Shahbaz Khan Salman Khan 33
Ting Yao Tao Mei 32
Xiaogang Wang Hongsheng Li 28
Shiguang Shan Xilin Chen 27
Richa Singh Mayank Vatsa 26
Dong Chen Fang Wen 24
Yi-Zhe Song Ayan Kumar Bhunia 24

Although it is quite rare for a paper to have a single author, 185 papers fall into this category. A few worthy mentions are research that introduced novel loss functions (Jonathan T. Barron, Takumi Kobayashi) and improved transformer architectures and post-training techniques (Takumi Kobayashi, Jing Ma). In this table we can see the authors with the most papers where they are the only author:

Author Papers
Takumi Kobayashi 4
Anant Khandelwal, Takuhiro Kaneko 3
Andrey V. Savchenko, Chong Yu, Dimitrios Kollias, Edgar A. Bernal, Jamie Hayes, Magnus Oskarsson, Ming Li, Oleksii Sidorov, Ren Yang, Rowel Atienza, Sanghwa Hong, Satoshi Ikehata, Shunta Maeda, Stamatios Lefkimmiatis, Ying Zhao 2

Identifying Topics

For this section, we used Top2Vec, an automatic topic modeling algorithm, to identify groups of papers that are similar to each other based on their titles and abstracts. The solution found 172 topics, which is a bit too many for us to analyze individually. Instead, we will focus on the hottest and coldest topics, which are those with the most and least papers in the last year, respectively.

One problem with the algorithm is that it identifies topics based on the words used in the papers, but it doesn’t provide a clear explanation of what those topics are about. This is a common problem with topic modeling algorithms, as they often produce results that are difficult to interpret. However, we can use LLMs to help us understand the meaning of these topics. We will use the most representative words of each topic (the words that appear most often in the papers of that topic) to generate a title and a paragraph summarizing it.

🔥 10 topics

{"data": [{"customdata": {"dtype": "i1", "bdata": "AQEBAQEBAQE=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "1", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "1", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAATYk4TYk6zPy5/O8fHSrs/nYHTmB/LqT8AAAAAAAAAAClzDHDwc6M/WASE4EIk2z+NVMytzkgQQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "AgICAgICAgI=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "2", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "2", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP2D9WLrhVLrjlPy5/O8fHSus/T9krrAn19D/gNilv2Ff1P1keEZpqv/o/IAQXItnICUBXZ6RibnUZQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "AwMDAwMDAwM=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "3", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "3", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "yLgg44KMA0DhCH/hCH8BQHodovvnewFAHGkRuaW7AUD4XU+RqdD2PyhljgEOfvY/XoOZB+cI9T/+ZAls2k8IQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BAQEBAQEBAQ=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "4", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "4", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "AAAAAAAAAAATYk4TYk6zPy5/O8fHSrs/nYHTmB/LqT8AAAAAAAAAAClzDHDwc7M/CgP2acj/0j+rM1Ixtzr5Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BQUFBQUFBQU=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "5", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "5", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "qYilIpaKAEBWLrhVLrgFQFZX4FHCLAZAL1M7NCvxAkCcmIhE4wX5P43zjw+M7Pg/mwIc7fRI4D8MB9Hju3D8Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BgYGBgYGBgY=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "6", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "6", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "NOHPhD8T3j9WLrhVLrjlP0kPVM5u4ec/0PHti4ME3T87/O+7niLjP1keEZpqv9o/eQPQ5pu2tT9VzK3OSMXoPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BwcHBwcHBwc=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "7", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "7", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8AAAAAAAAAAAAAAAAAAAAAnYHTmB/LuT+EcWIiEo2nPylzDHDwc8M/0wKJq16k4T+ZW52RirnzPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "CAgICAgICAg=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "8", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "8", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL4j9WLrhVLrjlPy5/O8fHSqs/NqGesldYwz+EcWIiEo3HPyM7FLZmnN8/AAAAAAAAAADwwkH0+C7kPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "CQkJCQkJCQk=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "9", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "9", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "NOHPhD8T/j9fX19fX1/vP3Dn+Fhpw+I/nYHTmB/L6T8j1cmZzanhP1keEZpqv9o/AAAAAAAAAACUJbBpP1niPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "CgoKCgoKCgo=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "10", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "10", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL4j/RleTQleTgP2nDMpe/nfM/0PHti4ME3T+1v65mtH7qP/erC2mxPOI/0wKJq16k4T/mVmekYm7xPw=="}, "yaxis": "y", "type": "scatter"}], "layout": {"legend": {"title": {"text": "topic"}, "tracegroupgap": 0}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}}}

The topics below are listed in the order in which they had the most published articles last year.

Topic 1 - Instruction-Tuned Multimodal LLMs for Vision-Language Understanding (157 documents)
Wordcloud for topic 1
Recent advancements in large language models (LLMs) and multimodal large language models (MLLMs) have led to remarkable capabilities in integrating visual and textual information for tasks like question answering, dialogue, and reasoning. By leveraging instruction tuning and visual prompt techniques, these models — such as vision-language models (VLMs) like CLIP — have significantly improved in-context comprehension and instruction following across modalities. Despite these achievements, challenges like hallucinations and limited applicability in complex, real-world settings still remain. Research continues to focus on enhancing multimodal instruction learning to facilitate deeper, more reliable understanding between vision and language inputs.

Top authors:
  1. Yu Qiao (7 papers)
  2. Ying Shan (5 papers)
  3. Yixiao Ge (5 papers)
Example papers:
Topic 2 - Controllable Text-Guided Image Editing with Diffusion and GAN Inversion (426 documents)
Wordcloud for topic 2
Advances in text-to-image diffusion models and GAN-based techniques have unlocked powerful, controllable image editing capabilities driven by natural language prompts. Methods like StyleGAN inversion, DDIM inversion, and textual inversion allow users to manipulate generated images or real inputs with high fidelity while preserving key features like identity. Text-guided editing leverages the latent space of pretrained diffusion and GAN models, enabling creative, precise, and personalized edits through simple prompts. Despite remarkable progress, achieving fine-grained control over complex edits and extending these capabilities to video editing remain active research challenges.

Top authors:
  1. Chen Change Loy (9 papers)
  2. Xintao Wang (8 papers)
  3. Ying Shan (8 papers)
Example papers:
Topic 3 - AI-Driven Medical Imaging and Diagnosis in Clinical Practice (345 documents)
Wordcloud for topic 3
The integration of computational methods into medical imaging and pathology has become increasingly important for improving diagnostic accuracy and early disease detection. Techniques like computer-aided diagnosis support clinicians in analyzing tissues, tumors, and organs across modalities such as histopathology, MRI, CT, and digital microscopy. Applications span cancer diagnosis (e.g., breast, brain, skin lesions), neurological diseases like Alzheimer's and Parkinson's, and blood analysis. By enhancing image analysis and tissue classification, these AI-driven tools aid in treatment planning and disease progression monitoring, making them vital in modern clinical practice and biomedical research.

Top authors:
  1. Le Lu (9 papers)
  2. Faisal Mahmood (5 papers)
  3. Ke Yan (5 papers)
Example papers:
Topic 4 - Challenges and Advances in 3D-Aware Text-to-Image and Text-to-Video Generation (68 documents)
Wordcloud for topic 4
Text-to-image and text-to-video generation using diffusion models has made remarkable progress, enabling the synthesis of high-fidelity, photorealistic assets from simple prompts. However, existing methods still struggle with accurately handling 3D geometry, novel views, and maintaining global and multi-view consistency. Techniques like diffusion priors, 3D Gaussians, and NeRF-based approaches aim to improve subject-driven generation and diverse, globally consistent outputs. Despite advances in pretrained diffusion models and anisotropic diffusion strategies, achieving high-fidelity, geometry-aware synthesis remains a central challenge in the evolution of text-to-3D and motion generation.

Top authors:
  1. Hsin-Ying Lee (4 papers)
  2. Sergey Tulyakov (4 papers)
  3. Ying Shan (4 papers)
Example papers:
Topic 5 - Remote Sensing and Aerial Imagery for Environmental and Agricultural Monitoring (306 documents)
Wordcloud for topic 5
The rapid development of satellite and unmanned aerial vehicle (UAV) technologies has fueled increased interest in using high-resolution imagery for environmental, agricultural, and urban management. Remote sensing enables the monitoring of plant species, crop types, water resources, land cover changes, and urban infrastructure such as roads and buildings, particularly aiding developing countries. Applications range from crop management and plant phenotyping to traffic management and tracking environmental factors and changes. Publicly accessible satellite and aerial datasets are becoming vital tools for tackling global challenges in resource management, environmental protection, and urban planning.

Top authors:
  1. Sara Beery (5 papers)
  2. David Lobell (4 papers)
  3. Edward J. Delp (4 papers)
Example papers:
Topic 6 - Competitions and Challenges in Computer Vision: The Role of NTIRE and Beyond (90 documents)
Wordcloud for topic 6
Large-scale competitions like the NTIRE Challenge, MegaFace Challenge, and ABAW Competition have become central to advancing computer vision research. Hosted at major conferences like CVPR, these challenges attract hundreds of registered participants and teams, competing across various tracks such as perceptual quality, AI-generated content, and traffic analysis. Through rigorous submissions and evaluations on standardized test sets, these challenges foster innovation, benchmark progress, and tackle formidable problems in the field. The NTIRE Workshop, in particular, has established itself as a premier platform for recognizing outstanding achievements and setting new frontiers in computer vision competitions.

Top authors:
  1. Radu Timofte (44 papers)
  2. Marcos V. Conde (8 papers)
  3. Radu Timofte (7 papers)
Example papers:
Topic 7 - Enhancing Diffusion Models: Faster Inference and Higher Image Quality (64 documents)
Wordcloud for topic 7
Diffusion models have emerged as powerful tools for generating high-quality images, particularly in text-to-image tasks, but they often suffer from slow inference speeds and inherent limitations tied to their timestep-based denoising process. Recent advances focus on accelerating inference and improving FID scores through innovations like post-training quality enhancement, tailored token mixing (e.g., super tokens, OCR tokens), and anisotropic diffusion strategies. These methods can be flexibly applied with negligible computational overhead, substantially improving image quality without retraining. By addressing inherent inefficiencies and showcasing superior performance, these techniques represent a major step forward in diffusion-based image generation.

Top authors:
  1. Deli Zhao (3 papers)
  2. Yujun Shen (3 papers)
  3. Chengyue Gong (2 papers)
Example papers:
Topic 8 - Intelligent Traffic Monitoring and Driver Behavior Analysis for Road Safety (58 documents)
Wordcloud for topic 8
Advances in intelligent transportation systems are increasingly focused on improving road safety through the analysis of driver behavior, traffic scenarios, and vehicle-pedestrian interactions. By leveraging traffic monitoring, naturalistic driving datasets, and leaderboard-driven challenges, researchers aim to develop better driver assistance systems and accident prevention technologies. Areas like distracted driver detection, traffic surveillance, and automated driving benefit from smart monitoring systems that enhance safe driving practices and reduce traffic accidents. As public leaderboards rank the best published methods, innovations in vehicle tracking, intelligent traffic analysis, and safe transportation continue to accelerate progress toward safer roads.

Top authors:
  1. Armstrong Aboah (3 papers)
  2. Fei Su (3 papers)
  3. Zhe Cui (3 papers)
Example papers:
Topic 9 - Soccer and Sports Video Analytics: Player Tracking and Game Understanding (103 documents)
Wordcloud for topic 9
Advances in sports video analytics, particularly for soccer, focus on tracking players, analyzing game states, and generating highlights from broadcast footage and sport-specific datasets. Systems capable of detecting player positions, ball movements, and team dynamics have become fundamental tools for both game analysis and automated content production. Publicly released datasets like UCF and HMDB, along with open-source code, drive innovation in this field, enabling teams around the world to develop systems capable of capturing, processing, and understanding complex sport scenarios. These developments are reshaping sports analytics, enhancing performance evaluation, and enriching fan experiences.

Top authors:
  1. Anthony Cioppa (11 papers)
  2. Marc Van Droogenbroeck (10 papers)
  3. Bernard Ghanem (8 papers)
Example papers:
Topic 10 - Event-Based Vision: High-Speed, Low-Latency Sensing with Neuromorphic Cameras (129 documents)
Wordcloud for topic 10
Event-based vision, powered by bio-inspired neuromorphic cameras, represents a major shift from traditional frame-based imaging. Unlike conventional sensors, event cameras capture asynchronous changes in brightness with low latency, low power consumption, and exceptional dynamic range, making them ideal for high-speed motion scenarios and environments prone to motion blur. This technology, including event-based vision for video frame interpolation (VFI) and eye tracking, has progressed rapidly, offering advantages in bandwidth efficiency and noise reduction. Applications span robotics, autonomous driving, and high-speed tracking, where conventional frame-based approaches often fall short.

Top authors:
  1. Davide Scaramuzza (11 papers)
  2. Mathias Gehrig (6 papers)
  3. Boxin Shi (5 papers)
Example papers:

🧊 10 topics

{"data": [{"customdata": {"dtype": "i1", "bdata": "AQEBAQEBAQE=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "1", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "1", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD8TYk4TYk7DPy5/O8fHSss/aRG5pbuR1j/N5tSIhffrP1keEZpqv+o/jwTxnqx//D84iB7fhYPgPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "AgICAgICAgI=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "2", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "2", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "sVTEUhFLAUDyexnyexkCQC5/O8fHSvs/T9krrAn19D+1v65mtH76P8HIDoWtGQdAviIgBBciEUBVzK3OSMUIQA=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "AwMDAwMDAwM=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "3", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "3", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPuD+Y+iGY+iHYPy5/O8fHSts/0PHti4ME7T+cmIhE4wX5P76sEqjoLf0/IAQXItnI+T9VzK3OSMXoPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BAQEBAQEBAQ=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "4", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "4", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YP6D8+eSo+eSr+P23VFXiUMANAAjGEv/MeAEBiIhKNF2QJQCM7FLZmnP8/44SUPMuI/j89vgsH0ePxPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BQUFBQUFBQU=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "5", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "5", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPyD+Y+iGY+iHYP2OfbNUVeOQ/AjGEv/Me8D8LrqN3/DDwPyhljgEOfvY/JoMsSX2tA0C1nyyBTfv7Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BgYGBgYGBgY=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "6", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "6", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "77S60+pOB0DhCH/hCH8BQELrjQzFu/g/HGkRuaW78T8j1cmZzanxP/SPD4zsUPg/BITgQiQb+T+61RmpmFvtPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "BwcHBwcHBwc=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "7", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "7", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "qYilIpaK8D8TYk4TYk7zP2OfbNUVeOQ/0PHti4ME7T/N5tSIhffrP1w6DXcvq+Q/PIRNAY52+j/hIHp8Fw7wPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "CAgICAgICAg=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "8", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "8", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "uSDjgowL4j/RleTQleTgP3Dn+Fhpw+I/aRG5pbuR5j+1v65mtH7qP/eySqCitwBAmwIc7fRIAED5LhxEj+/2Pw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "CQkJCQkJCQk=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "9", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "9", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "9oDZA2YPyD8TYk4TYk6zPy5/O8fHSrs/NqGesldY0z+1v65mtH7aP76sEqjoLd0/mwIc7fRI8D9eOIge34XbPw=="}, "yaxis": "y", "type": "scatter"}, {"customdata": {"dtype": "i1", "bdata": "CgoKCgoKCgo=", "shape": "8, 1"}, "hovertemplate": "topic=%{customdata[0]}<br>year=%{x}<br>occurrences (%)=%{y:.3f}<extra></extra>", "legendgroup": "10", "line": {"dash": "solid"}, "marker": {"symbol": "circle"}, "mode": "lines+markers", "name": "10", "orientation": "v", "showlegend": true, "x": {"dtype": "i2", "bdata": "4QfiB+MH5AflB+YH5wfoBw=="}, "xaxis": "x", "y": {"dtype": "f8", "bdata": "2FBeQ3kNFUAdk/Uck/UMQGnDMpe/nQNAdq1/ohRgB0CKSDRekKX/P7+6kBbLI/o/lYMGxlBk9j+waT9ZApvqPw=="}, "yaxis": "y", "type": "scatter"}], "layout": {"xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "year"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "occurrences (%)"}}, "legend": {"title": {"text": "topic"}, "tracegroupgap": 0}}}

The topics below are listed in the order in which they had the largest decrease in papers over the last year.

Topic 1 - Self-Supervised Pretraining: Masked Models and Their Impact on Downstream Vision Tasks (115 documents)
Wordcloud for topic 1
Self-supervised pretraining has revolutionized machine learning by enabling models to learn from unlabeled data, significantly improving performance on a wide range of downstream vision tasks. Techniques like masked autoencoding and contrastive pretraining have become central to this approach, where a model is trained to predict missing parts of the data, learning rich representations without the need for labeled examples. These methods, including masked token strategies and large-scale pretraining on unlabeled video or image datasets, have shown to outperform traditional supervised pretraining, achieving great success across various applications. The benefits of self-supervised learning, especially in terms of scalability and performance, are now being extensively explored in vision-language pretraining (VLP) and other vision tasks, often surpassing existing supervised methods.

Top authors:
  1. Yu Qiao (9 papers)
  2. Ishan Misra (4 papers)
  3. Ross Girshick (4 papers)
Example papers:
Topic 2 - Vision-Language Models: Aligning Text and Image for Cross-Modal Understanding (432 documents)
Wordcloud for topic 2
Vision-language models, such as CLIP, leverage large datasets of paired image-text data to learn the alignment between visual content and language. These models enable tasks like image captioning, where a description is generated for an image, and text-based image retrieval, where a sentence or phrase is used to find relevant visual content. Through pretraining on vast collections of images and their corresponding captions, these models achieve zero-shot capabilities, meaning they can generalize to tasks they were not explicitly trained on. Grounding text in visual concepts, such as matching sentences to images or videos, has become a central challenge in creating more sophisticated systems for understanding and generating visual and textual information across diverse domains, including untrimmed videos and spoken language.

Top authors:
  1. Lijuan Wang (9 papers)
  2. Mike Zheng Shou (7 papers)
  3. Ying Shan (7 papers)
Example papers:
Topic 3 - Semi-Supervised Learning: Leveraging Unlabeled Data for Improved Model Performance (179 documents)
Wordcloud for topic 3
Semi-supervised learning (SSL) is a powerful technique that utilizes both labeled and unlabeled data to train models, especially when labeled data is scarce. In SSL, pseudo-labeling is commonly employed, where unlabeled examples are assigned pseudo-labels based on model predictions, and these pseudo-labeled data are incorporated into the training process. This approach allows models to learn from a large amount of unlabeled data, improving generalization without requiring extensive labeled datasets. Methods like pseudo label refinement and self-training help ensure the quality and reliability of the pseudo-labels, making SSL effective for tasks like medical image analysis, where labeled data is limited. By combining labeled data with confident pseudo-labels from unlabeled examples, semi-supervised learning can outperform traditional fully supervised methods, particularly in challenging settings with partially labeled data.

Top authors:
  1. Jingdong Wang (4 papers)
  2. Lei Qi (4 papers)
  3. Yinghuan Shi (4 papers)
Example papers:
Topic 4 - Domain Adaptation: Bridging the Gap Between Source and Target Domains (323 documents)
Wordcloud for topic 4
Domain adaptation (DA) focuses on adapting models trained on a labeled source domain to perform well on an unseen target domain, addressing the challenges posed by domain shift or domain gap. This process is crucial when the source and target domains differ significantly, such as in cross-domain generalization tasks. Techniques like pseudo-labeling, self-training, and few-shot learning are employed to improve performance on target data, even when labeled data from the target domain is limited or unavailable. Domain adaptation methods aim to reduce the discrepancy between the source and target domains by minimizing the impact of domain shift and leveraging unlabeled target samples. These methods are vital for applications like visual domain adaptation, where new target domains with varying conditions or classes are frequently encountered.

Top authors:
  1. Luc Van Gool (8 papers)
  2. Dengxin Dai (7 papers)
  3. Wen Li (7 papers)
Example papers:
Topic 5 - 3D Object Detection: Advancements in Lidar and Monocular Approaches for Autonomous Vehicles (217 documents)
Wordcloud for topic 5
3D object detection is a critical component of autonomous driving, enabling vehicles to perceive and understand their environment in three dimensions. Using technologies like lidar, monocular cameras, and radar, 3D detection systems create detailed representations of the surroundings, often represented in formats such as birds-eye view (BEV) or voxel grids. Datasets like KITTI, NuScenes, and Waymo provide benchmarks for evaluating 3D detection models, with lidar-based point clouds playing a central role in high-accuracy detection of objects, such as pedestrians, vehicles, and obstacles. These systems face challenges such as slow inference speeds and the complexity of predicting occupancy grids, but advancements in lidar sensors, occupancy prediction, and BEV detectors are helping improve autonomous vehicle perception and safety. As autonomous driving systems evolve, 3D detection continues to be crucial for precise navigation and decision-making.

Top authors:
  1. Jie Zhou (7 papers)
  2. Jiwen Lu (6 papers)
  3. Yuexin Ma (6 papers)
Example papers:
Topic 6 - Weakly Supervised Object Segmentation: Balancing Annotations and Performance (244 documents)
Wordcloud for topic 6
Weakly supervised object segmentation focuses on leveraging less detailed annotations, such as image-level labels or object proposals, to train segmentation models. Unlike fully supervised methods that require pixel-level annotations, weak supervision relies on class-level or bounding box labels to guide the segmentation process. Datasets like Pascal VOC and MS COCO provide benchmarks for evaluating segmentation models, with metrics such as mean Intersection over Union (mIoU) used to assess performance. Techniques like class-agnostic object masks, discriminative region mapping (CAM), and pseudo-masks are employed to generate pixel-level segmentations from weak annotations. This approach aims to reduce the cost and effort associated with obtaining high-quality, pixel-wise annotations, while still achieving competitive segmentation results, especially for complex tasks like instance-level segmentation and object part identification.

Top authors:
  1. Junwei Han (5 papers)
  2. Yunchao Wei (5 papers)
  3. Bingfeng Zhang (4 papers)
Example papers:
Topic 7 - Image Relighting: Enhancing Lighting and Material Effects in Digital Rendering (167 documents)
Wordcloud for topic 7
Image relighting is a technique used in computer graphics and computational photography to manipulate or simulate changes in lighting conditions on a given scene or object. This process involves adjusting various aspects of lighting, such as specular and diffuse reflections, albedo, and shadow effects, to create realistic or desired lighting outcomes. It takes into account material properties like reflectance, illumination, and surface normals, which determine how light interacts with the scene. Relighting is commonly applied in tasks such as portrait or face relighting, where the goal is to alter lighting without changing the geometry of the scene. By estimating and adjusting lighting effects like specular highlights, shadows, and ambient light, image relighting allows for enhanced visual realism and flexibility in various applications, from film production to virtual environments and interactive systems.

Top authors:
  1. Boxin Shi (11 papers)
  2. Kalyan Sunkavalli (7 papers)
  3. Noah Snavely (6 papers)
Example papers:
Topic 8 - Large Kernel Convolutions and Self-Attention Mechanisms in Vision Transformers (209 documents)
Wordcloud for topic 8
The integration of large kernel convolutions and self-attention mechanisms has become a powerful approach in modern computer vision tasks. Large kernel convolutions, such as atrous or depthwise convolutions, allow for an increased receptive field, enabling the model to capture long-range dependencies in an image without significantly increasing computational cost. This is essential for vision tasks like object detection and segmentation, where understanding the global context is crucial. On the other hand, self-attention mechanisms, particularly in Vision Transformers (ViTs), facilitate capturing relationships between distant image patches, enhancing the model's ability to focus on relevant parts of an image. By combining large kernel convolutions with self-attention layers, models like the Vision Transformer (ViT) can effectively balance local feature extraction and global context understanding, leading to improved performance on benchmarks like ADE, Cityscapes, and COCO, especially in tasks like segmentation and scene understanding. This combination provides an efficient and scalable solution for handling complex vision tasks while maintaining competitive performance.

Top authors:
  1. Xiangyu Zhang (6 papers)
  2. Chang Xu (5 papers)
  3. Yu Qiao (5 papers)
Example papers:
Topic 9 - 3D-Aware Image Synthesis with GANs for High-Fidelity and Controllable Rendering (71 documents)
Wordcloud for topic 9
3D-aware image synthesis is an advanced technique that combines the power of Generative Adversarial Networks (GANs) with 3D geometry to create highly realistic and controllable images. By incorporating 3D-aware models like Neural Radiance Fields (NeRF) and leveraging latent spaces in GAN architectures such as StyleGAN, this method enables the generation of high-fidelity images from novel views or multi-view perspectives. This approach allows for fine-grained control over attributes like lighting, angles, and details, making it particularly useful for photorealistic rendering and editing. The synthesis process ensures that the generated images maintain consistency across different views and provide high-quality visual outputs, which can be applied in areas such as virtual reality, digital content creation, and computer graphics. With advancements in 3D-aware GANs, it is now possible to synthesize photo-realistic images with impressive fidelity, enabling novel applications in creative industries.

Top authors:
  1. Gordon Wetzstein (4 papers)
  2. Jiajun Wu (4 papers)
  3. Sida Peng (4 papers)
Example papers:
Topic 10 - Efficient Solving of Non-Convex Problems with Outlier Rejection and Relaxation Techniques (356 documents)
Wordcloud for topic 10
Solving non-convex problems, especially those involving outlier detection, correspondence, and registration, is a complex challenge in fields like computer vision and robotics. Methods like RANSAC (Random Sample Consensus) and polynomial solvers are commonly used to handle these issues, where outliers — incorrect data points — are filtered out to improve the accuracy of the solution. Convex relaxation techniques and non-convex optimization solvers are applied to iteratively refine the solution toward a globally optimal result. For example, pose estimation problems, such as relative rotation or translation, are solved efficiently by leveraging convex optimization and minimal solvers, ensuring that even with noisy or incomplete data, the solution converges to the correct answer. These techniques, including the use of least squares and graph matching, play a key role in ensuring robust performance in real-world applications, where noise and outliers are unavoidable.

Top authors:
  1. Daniel Barath (13 papers)
  2. Daniel Cremers (13 papers)
  3. Viktor Larsson (10 papers)
Example papers:

Conclusion

In this analysis, we have explored the trends and shifts in research topics within the CVPR community over the past years. The data reveals a dynamic landscape, with certain areas experiencing significant growth while others have seen a decline in interest. This reflects the evolving nature of artificial intelligence research and the continuous pursuit of innovation and improvement in various domains.

The hottest topics — ranging from instruction-tuned multimodal LLMs for vision-language understanding to the rapid advancements in text-guided editing, 3D-aware synthesis, and event-based vision — highlight a strong drive toward bridging modalities, creating more controllable generative models, and addressing the growing needs of real-world applications. Researchers are increasingly focusing on improving the integration of language and vision to enable more effective reasoning, better handling of ambiguities (such as hallucinations), and enhanced performance in both creative and safety-critical environments.

In contrast, the coldest topics — such as self-supervised pretraining, traditional vision-language alignment, semi-supervised learning, domain adaptation, and even classical 3D object detection — indicate areas where mature techniques have plateaued. While these methods laid the foundations for current advances, their rate of improvement appears to have slowed in favor of newer approaches. Techniques once at the cutting edge are now being revisited with an eye toward integrating them into more comprehensive systems, but their standalone appeal is declining as the community shifts toward end-to-end, multimodal, and task-specific solutions.

Taken together, the trends suggest that the field is steering toward more holistic, integrated models that not only push the boundaries of what automated systems can generate or analyze but also provide greater reliability and control in real-world applications. As the industry continues to explore the fusion of text, image, and even sensor data, the next wave of innovation will likely be driven by systems that learn from multiple modalities concurrently — while leveraging long-standing, robust principles as a stepping stone.

This evolution underscores the vibrant nature of artificial intelligence research, where established methods provide a stable base while emerging techniques hold the promise of reshaping the future of artificial intelligence.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • sli.dev for non-web developers
  • Improving your python code with simple tricks
  • The problem of research code reproducibility
  • Creating localized blog posts
  • Creating localized Projects pages