Artificial Intelligence Companies, Opena, MetaI, Anthropic, Google, Joy and others are showing their LLMs in the programming sector. Specimens of the deepest thought especially Gemini’s recent versionLike Like clad. Google has already agreed that AI has built a 25% new code written by the company.
According to the Microsoft study, there are still limitations for advanced AI models to solve software bugs, despite the progress of the region, which exceeds the experienced human programmer trouble. The According to the Sweet-Bench light benchmark platform, the Anthropic’s Claude 3.7 Sonnet Model or Openway O 3-Minni has failed to solve many problems, Created by Chatgpt owner.
These results show it There is too far to go to IA to be in the level of experienced programmers in the programming field.
Microsoft created Debug-Zim To help Develop LLM agents in interactive code environment, current LLM capabilities and large -scale code creation requirements and bug correction (Debuging). This text is lightweight and has a variety of useful tools such as Python Debagger designed to facilitate AI agents to correct the bugs.
In relation to the Microsoft study, nine AI models were tested, in which an agent had to access various bug correction tools, including Python Debugger. The test was solved by 300 software debug work. The results show that the latest models have also failed to complete more than half of the tasks. Claude 3.7 Sonnet 48.4% SuccessIndicates Tech Crunch.
OS Study authors believe that the lack of training data indicating a series of decisions of programmers in bug resolution is a problem. They believe that agents are required to fill the training of models such as records to interact with the debugger to collect the necessary information before referring to the correction.