GPT is getting better at doing physics but has a long way to go.

Jan 12, 2024

I’ve been playing around with GPT-4, which is free with Microsoft Copilot, to see if it could help me with my research.

A dirty little secret is that, at least in my case, when I have to do some tedious calculations I often end up just putting off my work. My chosen areas of study, gravity and quantum theory, both have some of the most off-putting calculations. Your basic Einstein equations have 10 partial differential equations. If you want to work in a particular coordinate system, you have to get all that right. If you are adding stuff to Einstein equations, that gets worse. I typically work in 5-dimensions, which means 15 equations. Usually, I’m in the ADM formalism too, which means I’m looking at how spacetime moves in another dimension.

I would love to have an assistant, a virtual grad student or post-doc, who could just go and do calculations for me.

At first I tried ChatGPT. The free version is based on GPT-3.5. The question was straightforward and basic at first but even with that ChatGPT was incapable of responding.

In essence, I was asking it to work in a particular kind of solution to the Einstein equations, asymptotically Anti deSitter spacetime. This kind of spacetime is very common in string theory but is fairly basic throughout general relativity since it is one of the maximally symmetric spacetimes. It is asymptotic because it becomes Anti deSitter as you move towards its boundary but it isn’t entirely Anti deSitter everywhere.

Usually, you use the Fefferman-Graham expansion for this which has a very simple form. If I just asked it to give me the Fefferman-Graham expansion, it is clear why ChatGPT can’t answer the first question!

Meanwhile, Microsoft Copilot says

For some reason, the iPhone app for copilot lets you choose whether to use GPT-4 or GPT-3.5, warning that GPT-4 may take longer. The desktop version, meanwhile, has no such selection but appears to be using the newer OpenAI models.

If I asked copilot the same question I initially asked ChatGPT, it gives a good answer:

This is not only correct but exactly what I asked for. (Note this information isn’t available on Wikipedia or any forum. I checked that first. You can find it in papers if you know what you are looking for.) I did notice that it uses the parathetical (3)R which typically means that this is the 3-dimensional Ricci scalar.

Indeed, everything it did was in four dimensions, which means the ADM formalism describes a 3-dimensional space moving in time. I found out later, however, that sometimes it uses the parenthetical to distinguish variables with the same name. This is probably the fault of physicists using conflicting notation in their papers but was confusing to me why it looked like GPT kept increasing the number of dimensions.

The papers it references are related to what it is talking about, but not always in any clear way. Also I noticed that it always cites things that are publically available on the web. I’m very suspicious about whether it was actually trained on these things. I’m sure it was trained on arxiv.org and wikipedia but probably a lot of paywalled stuff too.

My initial impression that GPT-4 “understood” what it was talking about was quickly disabused. (I am using that term “understood” loosely to indicate whether it had clearly ingested the rules of tensor math and coordinate systems.) As I asked more questions, it became clear that it was merely associating equations that it had been trained on in different data sets and noticed their similarities in terms of how physicists talked about them.

For example, I asked it about several different coordinate systems used in Anti deSitter spacetime: Poincaré patch, Global Coordinates, and deSitter slicing. I also asked it to represent these in five dimensions since it prefered four for the most part.

It failed to quite grasp the connections between all these coordinates. It also didn’t understand the distinction between Fefferman-Graham and Poincaré patch coordinates. The Poincaré patch it used most frequently was just the first term in the Fefferman-Graham expansion with a diagonal, Cartesian metric.

One of the most glaring errors is that it defined Poincare patch as related to global coordinates by a Lorentz boost but when it presented the coordinates, there was no boost to be seen. I guess the boost was zero?

All these details are to suggest that GPT-4 doesn’t quite understand what it is talking about. I get the feeling it doesn’t quite connect the words it is using and the equations it is using. It is almost like these are separate stories it is trying to tell.

Unlike a student who knows a little and doesn’t know how things fit together, GPT knows a lot but doesn’t understand how it all fits together. In this sense, GPT is more like a savant who can give you all kinds of facts and things they have read but can’t tell you what any of it means or give an original opinion about what it has learned.

My hopes of having a virtual assistant have to be reduced because, frankly, I don’t trust GPT-4 to give me answers that make sense when I ask hard questions. At best, I can reduce what I want to know to smaller, more easily solved problems that it can do and put the puzzle pieces together myself.

The Infinite Universe

Discussion about this post