Broken Netcode Demonstration -- A Bogus Kill
Re: Broken Netcode Demonstration -- A Bogus Kill
I'm not necessarily trying to add to your list, just pointing out something that feels different than before :). Personally, getting to make the choice of "catching" a smart missile intentionally, rather than having it trap me, is an interesting one and I don't mind it. It's just easier to do than before from what I can tell, and I wasn't sure if it might have affected the demo you showed (but from your description, it did not, as the blobs got him anyway!).
-
RiTides
- Posts: 223
- Joined: Sun Sep 01, 2013 6:13 pm
Ryusei helped take demos of flare fights from two sides.
I'm running some scripts to analyze the flare shots and look for differences in orientation. I don't have the whole story just yet, but I do definitely have enough to confirm the theory that shots sometimes come out in the wrong direction on the other screen.
This is a chart of the X coordinate of my ship, ryusei's ship, and all the flares we shot, vs. time. The blue is my demo, and the orange is his. I know it's sort of a mess, but look for places on the chart with straight lines. Those are flare shots; when they go flat, they hit a wall and stop moving. I've circled several places where the flares are very clearly being fired at different angles -- sometimes REALLY different angles. In a couple cases, the final resting place is off by half a cube (and this over the width of the Athena dog room).

The effect looks pretty random shot by shot -- at least, I can't see a pattern in it. Though I do notice a couple clusters of shots close in time that seem to be off by the same angle.
It's hard to say from this chart exactly how bad the problem is. Just looking at one coordinate gives an incomplete picture, but it is enough to prove that it's at least as bad as this. Out of what . . . 50, maybe 100 flares shots, I see major divergence in four or five, and minor in another five or ten. No wonder you're missing a lot.
------------------
We ran the same test in Rebirth 0.55. Here's what that looks like:

A few of the flares have 1/20th of a cube differences in the final resting position -- a believable result of lag in an oblique shot -- but all of the firing directions are perfectly dead identical. Most of the resting places are identical.
In other words, the problem is absent in 0.55. Something went wrong recently.
This analysis agrees with our subjective experience in the game. In 0.55, the ships seemed extremely accurate -- eerily so, by the standard I've become used to. If I got sparks, I got clangs. Which is as it should be -- we had a 50-70ms ping to each other!
So the problem is definitely real, and whatever the cause it, it's recent -- Rebirth era, post version 0.55.
Demos:
I'm running some scripts to analyze the flare shots and look for differences in orientation. I don't have the whole story just yet, but I do definitely have enough to confirm the theory that shots sometimes come out in the wrong direction on the other screen.
This is a chart of the X coordinate of my ship, ryusei's ship, and all the flares we shot, vs. time. The blue is my demo, and the orange is his. I know it's sort of a mess, but look for places on the chart with straight lines. Those are flare shots; when they go flat, they hit a wall and stop moving. I've circled several places where the flares are very clearly being fired at different angles -- sometimes REALLY different angles. In a couple cases, the final resting place is off by half a cube (and this over the width of the Athena dog room).

The effect looks pretty random shot by shot -- at least, I can't see a pattern in it. Though I do notice a couple clusters of shots close in time that seem to be off by the same angle.
It's hard to say from this chart exactly how bad the problem is. Just looking at one coordinate gives an incomplete picture, but it is enough to prove that it's at least as bad as this. Out of what . . . 50, maybe 100 flares shots, I see major divergence in four or five, and minor in another five or ten. No wonder you're missing a lot.
------------------
We ran the same test in Rebirth 0.55. Here's what that looks like:

A few of the flares have 1/20th of a cube differences in the final resting position -- a believable result of lag in an oblique shot -- but all of the firing directions are perfectly dead identical. Most of the resting places are identical.
In other words, the problem is absent in 0.55. Something went wrong recently.
This analysis agrees with our subjective experience in the game. In 0.55, the ships seemed extremely accurate -- eerily so, by the standard I've become used to. If I got sparks, I got clangs. Which is as it should be -- we had a 50-70ms ping to each other!
So the problem is definitely real, and whatever the cause it, it's recent -- Rebirth era, post version 0.55.
Demos:
-
Drakona
- Site Admin
- Posts: 1494
- Joined: Fri Aug 30, 2013 5:35 pm
For reference, the flare fight lasts around a minute, and there are at least ten flares flying at the wrong angle.
I suspect the flare fired at around t=140,000 from x=41 toward either x=39 or x=32 hit for one person and missed for the other.
I suspect the flare fired at around t=140,000 from x=41 toward either x=39 or x=32 hit for one person and missed for the other.
-
LotharBot
- Posts: 708
- Joined: Sat Aug 31, 2013 1:11 pm
Interesting. I never really played with 0.55. I came back shortly before 0.57 was released. We did a whole lot of "beta" testing during that time. For the last 2 years, some problems have been fixed (kind of? Hard to remember specifics.) But plenty have been created. I'd be interested in knowing which version introduced this problem. Or maybe it was a string of changes. I know zico had said something about movement prediction at some point. I wonder if he has already incorporated some of that.
-
Ryguy
- Posts: 122
- Joined: Wed Sep 04, 2013 8:26 pm
Here's the full angle analysis. It looks bad!
The setup -- I used a script to find all of the flare shots in the two demos (looking for flares fired from the same location, at the same time), and compared the velocity vectors. Simple math gets me the angle between the two flare shots as seen on the two screens.
So basically, if a dot on this chart is at 10 degrees, that means the flare was moving on my screen in a direction 10 degrees off from how it was on Ryusei's screen. (Doesn't say anything about which screen is right, just that they're different by that much).
Here's the chart for 0.57.3 Retrohomers

For reference, a spreadfire cone spans about 7 degrees -- each shot about 3.5 degrees off center. (That's not exact, I just measured it in game and did some back of the envelope trig). Any of the shots on the chart above that line . . . less accurate than a random spread blob. It looks like about 1 in 6 are that bad!
1 in 50 are off by 12 degrees or more, which is enough to turn a dead on shot into a miss from just over one cube away.
And one of the shots is off by a mind-boggling 33 degrees!
It's bad.
Here's the same chart for the flare fight in 0.55 --

They're all so close to zero you can't see a difference! Zooming in . . .

Out of 134 flare shots, none of them show an error larger than 0.65 degrees, and the vast majority are less than 0.5 degrees. And note that once the numbers get this small, the demo-recording process starts to introduce errros, so in game it might actually be better than even that.
I don't have any theories on a cause yet, but it's definitely something that broke after 0.55 and before 0.57.3. And this isn't lag!!
-------------
Raw data:
The setup -- I used a script to find all of the flare shots in the two demos (looking for flares fired from the same location, at the same time), and compared the velocity vectors. Simple math gets me the angle between the two flare shots as seen on the two screens.
So basically, if a dot on this chart is at 10 degrees, that means the flare was moving on my screen in a direction 10 degrees off from how it was on Ryusei's screen. (Doesn't say anything about which screen is right, just that they're different by that much).
Here's the chart for 0.57.3 Retrohomers

For reference, a spreadfire cone spans about 7 degrees -- each shot about 3.5 degrees off center. (That's not exact, I just measured it in game and did some back of the envelope trig). Any of the shots on the chart above that line . . . less accurate than a random spread blob. It looks like about 1 in 6 are that bad!
1 in 50 are off by 12 degrees or more, which is enough to turn a dead on shot into a miss from just over one cube away.
And one of the shots is off by a mind-boggling 33 degrees!
It's bad.
Here's the same chart for the flare fight in 0.55 --

They're all so close to zero you can't see a difference! Zooming in . . .

Out of 134 flare shots, none of them show an error larger than 0.65 degrees, and the vast majority are less than 0.5 degrees. And note that once the numbers get this small, the demo-recording process starts to introduce errros, so in game it might actually be better than even that.
I don't have any theories on a cause yet, but it's definitely something that broke after 0.55 and before 0.57.3. And this isn't lag!!
-------------
Raw data:
-
Drakona
- Site Admin
- Posts: 1494
- Joined: Fri Aug 30, 2013 5:35 pm
Well, at least this proves that I'm not a hallucinating crazed lunatic.
I'm no math expert but I know when something isn't right in Descent and it can't be attributed to lag.
Too many hours behind the pyro wheel to not know.
I'm no math expert but I know when something isn't right in Descent and it can't be attributed to lag.
Too many hours behind the pyro wheel to not know.
-
Jediluke
- Posts: 1879
- Joined: Fri Aug 30, 2013 10:00 pm
Could a mod drop those values to zero? Some back of the envelope math.
Descent's internal velocity vectors take up 6 bytes. The fastest rate of fire gun is vulcan, at about 20 shots per second. Suppose you have to send that data to 8 players in an anarchy game. That's 960 bytes per second. If all 8 players are shooting vulcan at once, that works out to about 7.5 kBps.
There's overhead, too. So that's definitely a cost, but then again, most of us have connections measured in MBps, not KBps.
Second fastest rate of fire is plasma, at 5.5 per second, dropping the bandwidth requirement for all 8 players shooting plasma at once to about 2 kBps.
It would require testing, but perfect orientation data looks pretty affordable, if far from free.
Descent's internal velocity vectors take up 6 bytes. The fastest rate of fire gun is vulcan, at about 20 shots per second. Suppose you have to send that data to 8 players in an anarchy game. That's 960 bytes per second. If all 8 players are shooting vulcan at once, that works out to about 7.5 kBps.
There's overhead, too. So that's definitely a cost, but then again, most of us have connections measured in MBps, not KBps.
Second fastest rate of fire is plasma, at 5.5 per second, dropping the bandwidth requirement for all 8 players shooting plasma at once to about 2 kBps.
It would require testing, but perfect orientation data looks pretty affordable, if far from free.
-
Drakona
- Site Admin
- Posts: 1494
- Joined: Fri Aug 30, 2013 5:35 pm
Well, that's some awesome work Drakona! Kudos to you and Ryusei for testing out both versions, it's good to see that there's a version "not too far back" that didn't have the problem... should make it easier to track down at some point. Although, like you say maybe a patch can fix it rather than having to unravel the issue from all the other things that have been added since then.
-
RiTides
- Posts: 223
- Joined: Sun Sep 01, 2013 6:13 pm
I've found part of the problem at least. Short packets don't contain enough data to accurately reconstruct shots.
In 1994, the packets sent over the network contained a lot of data. 24 bytes devoted to your position and velocity; 36 devoted to your orientation and another 12 for rotational velocity. The designers of the game devoted a ton of their bandwidth to sending orientation data accurately, over half of the packet. These things specified your orientation down to hundredths of a degree (which is probably overkill), and sent rotational velocity so the game could estimate orientation between packets.
In 2013, we're using short packets. They're 1/3 the size of the original packets. 14 bytes for your position and velocity, 9 for your orientation, and none for rotational velocity. If a drop from 36 bytes of orientation data to 9 seems pretty steep . . . well, it is. Too steep. The data structure it's using for orientation is in the original code, but it wasn't designed for multiplayer pyro use. It's used for recording demos and for sending robot state in co-op games. Never used for players until now!
Now, of those 9 bytes for orientation, only three specify the critical "forward" vector your shots are based off of -- and those make up an X, Y, Z unit vector. With one dimension lost to the unit vector, you wind up with two bytes to specify 360 x 180 degrees of azimuth and elevation. A byte only has 256 values (and I think we're only using 200 of them here).
Which is to say the data you're sending over the network in short packets has a resolution of about one degree. A third of a spreadfire cone.
Now that can't be the whole problem -- it doesn't explain how you can get off by 10 or 15. That implies something wonky is going on in the physics engine. But that's something that would have to be fixed at a minimum. These short packets were probably intended to make network connections better by reducing the required throughput, but by sacrificing too much orientation accuracy, they've made them worse.
I did a quick test to see how much of the problem the short packets account for. Changing nothing else, I replaced the new short packets with the old style long ones in Retrohomers and did a quick build and flare fight test (with Lothar over LAN . . . I really ought to test again over internet for comparable results, but no one was around . . . and I really don't think lag plays into this as long as the connection is consistent).
Here's the result:

That's definitely better! And agrees with our perception in game that that flarefight is "the best connection we've had in a long time", and we play on LAN all the time. It's still not as good as 0.55, though, which implies there's still a factor I haven't uncovered.
As an aside, no you can't have this build. It's only a test build for exploring the problem. In the process of changing the packets, I broke Linux compatibility and messed something else up -- the multiplayer ships stutter and shake a lot. That points to something in the physics engine I'm missing (or something I screwed up when messing with the net code in the first place . . . I tell you, this code is TENSE). On the bright side, if a buggy build is this accurate, perhaps resolving the bugs will clear up the remaining angle errors.
It would need to be tested whether an 8 player game can handle the increased bandwidth requirements of the big packets. If not, a compromise is possible: send a more accurate forward vector along with the short packet -- a size increase of only six or possibly 12 bytes. Another compromise would be to make old style super-accurate positioning a game option (that you'd notionally only check in a 1v1 or small game). Part of the concern is the client/server architecture: the server is sending everyone's update packets to everyone else, where you used to only have to send your own, and it's all on their upstream connection. That's 8x more data than you needed to send in 1996, but . . . I don't know about all of you, but my upstream is WAY more than 8x faster than it used to be. Still, it would need to be tested.
I'm toying with the idea of taking an alternate approach and going around the problem entirely: sending position and orientation data for each shot instead of basing the shot's position and orientation off of the ship's. That should result in perfect orientation alignment on both screens -- then you really would only have to worry about leading for lag. Your shots would NO KIDDING go exactly where you put them. It might look visually jarring, though -- shots could come out pointed the wrong direction compared to the pyro they came from. That should be harmless if it's only a difference of 2 degrees at most (as in the above chart), but could be quite jarring at 10 or 15, I think. I suspect the network cost of such an approach would be trivial -- particularly compared to reintroducing long packets! -- but that would require testing, particularly with vulcan.
For now, I'm going to see if I can figure out what's wrong / restore the 0.55 behavior before trying the above approach to improving on it.
In 1994, the packets sent over the network contained a lot of data. 24 bytes devoted to your position and velocity; 36 devoted to your orientation and another 12 for rotational velocity. The designers of the game devoted a ton of their bandwidth to sending orientation data accurately, over half of the packet. These things specified your orientation down to hundredths of a degree (which is probably overkill), and sent rotational velocity so the game could estimate orientation between packets.
In 2013, we're using short packets. They're 1/3 the size of the original packets. 14 bytes for your position and velocity, 9 for your orientation, and none for rotational velocity. If a drop from 36 bytes of orientation data to 9 seems pretty steep . . . well, it is. Too steep. The data structure it's using for orientation is in the original code, but it wasn't designed for multiplayer pyro use. It's used for recording demos and for sending robot state in co-op games. Never used for players until now!
Now, of those 9 bytes for orientation, only three specify the critical "forward" vector your shots are based off of -- and those make up an X, Y, Z unit vector. With one dimension lost to the unit vector, you wind up with two bytes to specify 360 x 180 degrees of azimuth and elevation. A byte only has 256 values (and I think we're only using 200 of them here).
Which is to say the data you're sending over the network in short packets has a resolution of about one degree. A third of a spreadfire cone.
Now that can't be the whole problem -- it doesn't explain how you can get off by 10 or 15. That implies something wonky is going on in the physics engine. But that's something that would have to be fixed at a minimum. These short packets were probably intended to make network connections better by reducing the required throughput, but by sacrificing too much orientation accuracy, they've made them worse.
I did a quick test to see how much of the problem the short packets account for. Changing nothing else, I replaced the new short packets with the old style long ones in Retrohomers and did a quick build and flare fight test (with Lothar over LAN . . . I really ought to test again over internet for comparable results, but no one was around . . . and I really don't think lag plays into this as long as the connection is consistent).
Here's the result:

That's definitely better! And agrees with our perception in game that that flarefight is "the best connection we've had in a long time", and we play on LAN all the time. It's still not as good as 0.55, though, which implies there's still a factor I haven't uncovered.
As an aside, no you can't have this build. It's only a test build for exploring the problem. In the process of changing the packets, I broke Linux compatibility and messed something else up -- the multiplayer ships stutter and shake a lot. That points to something in the physics engine I'm missing (or something I screwed up when messing with the net code in the first place . . . I tell you, this code is TENSE). On the bright side, if a buggy build is this accurate, perhaps resolving the bugs will clear up the remaining angle errors.
It would need to be tested whether an 8 player game can handle the increased bandwidth requirements of the big packets. If not, a compromise is possible: send a more accurate forward vector along with the short packet -- a size increase of only six or possibly 12 bytes. Another compromise would be to make old style super-accurate positioning a game option (that you'd notionally only check in a 1v1 or small game). Part of the concern is the client/server architecture: the server is sending everyone's update packets to everyone else, where you used to only have to send your own, and it's all on their upstream connection. That's 8x more data than you needed to send in 1996, but . . . I don't know about all of you, but my upstream is WAY more than 8x faster than it used to be. Still, it would need to be tested.
I'm toying with the idea of taking an alternate approach and going around the problem entirely: sending position and orientation data for each shot instead of basing the shot's position and orientation off of the ship's. That should result in perfect orientation alignment on both screens -- then you really would only have to worry about leading for lag. Your shots would NO KIDDING go exactly where you put them. It might look visually jarring, though -- shots could come out pointed the wrong direction compared to the pyro they came from. That should be harmless if it's only a difference of 2 degrees at most (as in the above chart), but could be quite jarring at 10 or 15, I think. I suspect the network cost of such an approach would be trivial -- particularly compared to reintroducing long packets! -- but that would require testing, particularly with vulcan.
For now, I'm going to see if I can figure out what's wrong / restore the 0.55 behavior before trying the above approach to improving on it.
-
Drakona
- Site Admin
- Posts: 1494
- Joined: Fri Aug 30, 2013 5:35 pm