Page 1 of 1

June 23, 2012 - Day 31 Time Bomb

PostPosted: Sat Jun 23, 2012 6:13 pm
by Little_Mike
So we have tried writing development blog posts before, however we have never really kept up with them. There were a few reasons for this. One being, we never really had an audience. Another reason was it took a long time writing these posts and without an audience, it seemed like it was just a waste of time. Well, a month ago, something big changed for us at Venan. We released the 1.3 patch for Book of Heroes which introduced two big main features: Chat and Guilds. All of a sudden we now have an enthusiastic community. So I thought to give this whole blog thing a try again. I haven't actually talked about this with the other members on the dev team, but I think they will support this idea. After all, I think a lot of you guys might enjoy these posts which I hope will give a little insight of how our little world of Glenfort works.

As you all know, last Thursday, we had a little of a hiccup with our servers. Around 9 AM EDT, I was driving to the office when I got an email from beekeo and a text message from a friend saying that they were getting a server connection error saying that their version of the game is newer than the server. I think to myself, it's going to be one of those days. Since the 1.3 release, we have had several fires which we had to put out. This one would be one of our biggest yet!

I come into the office, and I'm the first one in. Not very good. I'm one of the software engineers, but I'm not the main server guy. I write server code, but I'm not intimate with the inner workings of our server. After some debugging, I determined that our server monitor took down the application servers due to health issues and brought up new servers with the wrong build version. In an attempt to bring the servers up, I manually copied over the server build to each of the servers, which momentarily allowed people to play the game. However, there was still massive lag in the game. A few minutes later, our monitor went crazy and brought up four new servers, again with the wrong version of the server. This caused a bunch of people to once again get the server error. At this point, our Technical Director, LordNathor, came into the office. As I was typing frantically to upload the build to the other servers, I yelled to him, "Help me put out this fire!" And so we got to work.

First thing we did was bring down the servers. We then looked over some of the Customer Support emails and we received one at around 6:30 AM telling us that one of his four characters could not log into the game. This would be the clue that would help us figure out what our problem was. After debugging our servers, we discovered that we were hitting an infinite loop when we were calculating the power of the guild of the character that was locking up. (Note: Ideally we shouldn't be calculating the power of the guild when loading a character... But that is an optimization we have to make later when we have time...).

If you know some code, maybe you can see what's wrong here:

Bad Server Code wrote:Iterator<Integer> itr = m_powerHistory.iterator();
while (itr.hasNext())
{
    final int kDecay = GuildPowerDecayData.getDecay(day++);

    // If the day is not defined, do not have it contribute. Same as factor 0.
    if (kDecay >= 0)
    {
        m_power += itr.next().intValue() * (10000 - kDecay) / 10000;
    }
}


Since a guild's power only takes into account a month's worth of power into account, when day 31 hit, GuildPowerDecayData.getDecay() would return -1. Since itr.next() got called only when kDecay was greater than 0, it would hit an infinite loop and lock our application servers at 100% CPU utlization. We had found our problem! One of our programmers accidentally left this in since day one of 1.3 and we didn't know about it until Thursday, which was a month later. And so, we called this our server time bomb that was ticking and waiting to happen. Fortunately for us, we had scheduled a server maintenance on that day, and so after fixing that problem and installing new server fixes (Remember those mail problems? They should be fixed now!), we were able to bring up the servers once again. That was certainly one of the more fun days at the office recently. :lol:

Changing the subject, one of the coolest things you the community has thought of is the monthly Fan Art contest. We haven't determined the details yet, but we are thinking of taking in submissions of fan art and then every month, have a vote to see which ones you all like the most. Of course there will be some sort of in-game rewards! So if you are an artist or artist-to-be, get those pencils, pens, brushes, tablets ready! Perhaps some of the art could make it in-game!

Speaking of art, take a look at my lovely programmer art that I created when we were first developing the game. This is the first version of Glenfort. My how things have changed.

Image

Anyway, that's all for now! Stay tuned for the next post!

-Little_Mike

Re: June 23, 2012 - Day 31 Time Bomb

PostPosted: Sat Jun 23, 2012 6:48 pm
by Anthrax777
Love the story Mike and glad to hear about a Dev blog. Being a DBA, can relate. Have to just love those days, had one myself about 2 weeks ago when a programmer pushed to live instead of Dev and caused great havoc.

Re: June 23, 2012 - Day 31 Time Bomb

PostPosted: Sat Jun 23, 2012 6:50 pm
by Knagrim
Enjoyed the story. Keep them coming. They make an interesting read. Also like the idea about the blog.

Re: June 23, 2012 - Day 31 Time Bomb

PostPosted: Sun Jun 24, 2012 12:55 pm
by Hai5
One of those days you wish you were on vacation :) but the day went by quick, didn't it?

Re: June 23, 2012 - Day 31 Time Bomb

PostPosted: Thu Jun 28, 2012 1:51 pm
by OneDannyBoy
I greatly enjoyed reading this. Thankyou :-)

Re: June 23, 2012 - Day 31 Time Bomb

PostPosted: Sun Jul 08, 2012 1:47 am
by Maeniel
definitely understand as I have been having an issue at work since November 2011 and most recently spent almost 400 hours (after hours) when our vendor didn't tell us to type 19 characters to get it to work with our new fiber channel. It's a good thing they were far away when I learned that.

Re: June 23, 2012 - Day 31 Time Bomb

PostPosted: Wed Jul 11, 2012 1:06 pm
by Meian
Hai5 wrote:One of those days you wish you were on vacation :) but the day went by quick, didn't it?


Those days only go by quick when you're paid by the hour....