Eh...I argued the same until I read some of the recent research reports that are starting to come out re. LLMs. They are going well beyond their original design, such as by figuring out how to jerry rig themselves a working memory in order to solve problems that should not have been solvable within their original design limitations. These are evolving programs and no one in the field has perfect understanding of how they operate or what their limitations are.
IMO, it is risky to anthropomorphize them by comparing them to human benchmarks. AI sentience, if it happens, might not look anything like human sentience. Plus, we don't really know how human sentience works, or have firm agreement on how to measure it. Most people, for example, think of humans as having a sort of cohesive core or mind, but research shows that how our minds actually work and how we perceive them working are utterly distinct.
As a Language and Literature and Theory of Knowledge teacher with decades of experiences, one thing LLMs have done is make me question what I thought I knew about human creativity, specifically in the context of how and why I assess writing...which is a huge part of my job. We used to regard the composition as a sort of gold standard method of assessing human intelligence, but it turns out that what we were mostly assessing was probably less about creative problem solving and more about memory and repetition.