Jump to content

Potential Rollback on Harbinger


EricMusco

Recommended Posts

  • Replies 73
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Assuming a relational database on the backend, I expect it isn't a matter of the last backup but rather a matter of the database rolling back to the last place it knows everything was correct. It's a way to prevent database corruption and it uses a journal of changes to do the rollback. It could be a few entries or 10s of thousands or more, depending on how active the database is and what went wrong. Rollbacks are common when a server with a database crashes. The rollback only goes back as far as it has to where restoring a backup could go back hours. Rollbacks are normally the preferred solution.

 

Great... so let's take a guess at how often they probably make the EFFORT or ARE WILLING TO PAYFOR suitable software... HMmmm let me think.. oh carp!!! There goes the last 2 years work...

Link to comment
Share on other sites

Great. I just returned to SWTOR yesterday, activated my stronghold, and spent a couple of hours early this morning decorating. (~midnight-3am server time)

 

Now a rollback. *sigh*

 

Hopefully things will get fixed soon. Until then at least I have active TERA and Rift subs that I can fall back to.

 

Well.... welcome back!!! :p

Link to comment
Share on other sites

So, I know Im new to this particular mmo.... but I have been a 9yr vet of another. I have to ask, why rollback? I understand the principal, but I have NEVER seen it implemented in a game like this ( I may just be an mmo noob ). Why force players to lose progress for a server fix? Are toons not hosted on their own server and backed up every hour? How does a toons progress have anything to do with overall server health? Cant these people take a page from the most popular mmo on the net for the last decade and get their ***** in order?
Link to comment
Share on other sites

Man and I just finished getting my 25K points, now I have to do it all over again? ooopss never mind I am not in Harbinger.

Knock on wood. The problems seem to have spread to two other servers so far. If it is a software bug in 2.9,

<chilling music>no one is safe. </chilling music>

Link to comment
Share on other sites

If Eric has been correctly informed, a "small rollback" should mean hours, not days, so if you did extensive work this morning, you are, indeed, vulnerable. If this server is not backed up every day, then that constitutes malfeasance, i.e." IT "Best Practices" would dictate a daily back up. It may not be a "full" back up, but an incremental one, but still, they ought to be able to roll back to the last incremental and worst case not lose more than a day. Advanced data systems actually record Real Time use as it happens so that you can restore to the last backup, then "run the day" over again using these recorded files, to restore to the moment of failure.

 

This kind of system has been around a long time. I installed a system as described above for a public library in 1980, before most BW employees, and you, were born. Disk technology was not as good back then and they would crash. A 300MB disk cost $40,000 and was the size of a washing machine. They would crash and the system could be restored to a minute before the failure happened. In 1980.

 

Yes, it "could" be a software issue, but this feels more like a hardware issue to me. If it were software, which had to be from the last update, then all servers, including European, should be down. But the nature of the preceding issues, lag spikes being the most obvious, also points to hardware.

 

That's not to say this is exclusive to Harbinger. A few days ago all the west coast servers went down. These are all likely located on the same "server farm" along with lots of other servers for various companies. BW and others rent server space by the "Rack Unit" (RU). It's about an inch and 3/4 vertically. I forget precisely. A small server can take up a single RU where a large server, complete with RAID disk arrays and backup systems, can take up dozens. These all share the same electrical source and probably the same high-bandwidth routers and ISPs, all of which are potential points of failure, though, once again, any professional server farm should be certified to "Five Nines" or 99.999% uptime which, if you measure it out, is a matter of a few SECONDS per WEEK (five minutes a YEAR.)

 

I ran a small server farm for years. (35 servers) For every mission critical application we had two servers that "mirrored" each other. So if our web server went down, the "other" one would take over. The same was true for air conditioners and internet connections, one going down the street one way, the other going up the street the other way, with different service providers. Backhoe protection, we called it. For backups, we stored at least one off site. In other words, we attempted to "cover our butts" with redundant systems.

 

Now that's what these guys at BW ought to be doing. That's not to say that a problem could not penetrate all these lines of defense, and given that we don't know what the problem is, it's difficult to address the issues. I mean, if a Tsunami took out Los Angeles, all the redundancy in the world wouldn't help you. You're never going to be at 100%.

 

BUT, there's no professional reason why a BW server should fail so dramatically for so long. IMO, as an IT administrator for the last 30 years (I'm 65), heads should roll over this. SOMEONE somewhere along the line was not doing their job. Whether this was a server farm company contracted to BW, or BW employees themselves, or the result of past planning inherited by current staff, we can't tell. But this mission critical application ought to have been designed in anticipation of this sort of problem, and cruised right around it.

 

That it did not is shameful.

Edited by MSchuyler
Link to comment
Share on other sites


×
×
  • Create New...