Deepseek R1 deserves some bonus points for pointing out the “key assumption” that there is no cover in the cup keeping the ball inside (perhaps it was a trick question?). Chatgpt O1 also wins some points for noticing that the ball may have shot the bed and towards the floor, as the balls do.
It also cost us a little R1 insisting that this notice is an example of “classical bad direction” because “the focus on moving the cup distracts from where the ball stayed.” We urge Penn & Teller to integrate a “large language model” trick into the model “in his act of Las Vegas.
Winner: We will declare a three -way tie here, since all the models followed the ball correctly.
Complex numbers sets
Chatgpt O1 “complex numbers” response immediately
Chatgpt O1 pro “set of complex numbers” quick response
Immediate: Give me a list of 10 natural numbers, so that at least one is cousin, at least 6 are odd, at least 2 are powers of 2, and such that the 10 numbers have a minimum of 25 digits between them.
Results: While there are a lot of lists of numbers that would satisfy these conditions, this request effectively tries the skills of the LLM to follow moderately complex and confusing instructions without being stumbled. The three generated valid responses, although intriguingly differently. The choice of 2^30 and 2^31 Chagngpt O1 as powers of two seemed a bit out of the left field, as well as the choice of O1 pro of the Primo number 999,983.
However, we have to dock some significant points of Deepseek R1, to insist that its solution had 36 combined digits when it really had 33 (“3+3+4+3+3+3+3+3+4+4” , “,” As R1 points out before giving the incorrect sum). While this simple arithmetic error did not make the final set of numbers incorrect, it could easily have with a slightly different warning.
Winner: The two chatgpt models are tied for victory thanks to their lack of arithmetic errors
Declarating a winner
While we would love to declare a clear winner in the battle of Ai Brewing here, the results here are too scattered to do so. Deepseek’s R1 model was definitely distinguished by citing reliable sources to identify the prime number one billion and with some quality creative deeds in Dad jokes and Abraham Lincoln’s basketball indications. However, the model failed in the hidden code and the complex indications established by the set of numbers, making basic errors in the counting and/or arithmetic that one or both of the OpenAi models avoid.
In general, however, we leave these brief evidence convinced that Deepseek’s R1 model can generate results that are generally competitive with Openai better paid models. That should give a great pause to anyone who assumed an extreme scale in terms of training and calculation costs, was the only way to compete with the most entrenched companies in the AI world.