Posted By: Dr Herbie | Oct 23rd @ 10:25 PM
page 1 of 1
Comments: 11 | Views: 98
Dr Herbie
Dr Herbie
Horses for courses

At the moment, trying to get to the Coffeehouse gives an error.  Everywhere else seems fine.

 

Herbie

 

 

EDIT: ok, so it's fine now.  Temporary glitch, I guess.

 

Bas
Bas
It finds lightbulbs.

It's not fine for me. I've been getting the "Something went wrong" page about 75% of the time when I try to go anywhere else than the front page. This has been happening for half a day or so now.

 

Edit: actually it's more like 90% of the time for the Coffeehouse, and 50% of the time for anything else.

 

Edit 2: now Coffeehouse is permanently down. I fear for the Revolution.

Yep Coffeehouse appears down for me too. I started getting the errors last night.

 

edit: Now it's working. Odd.

figuerres
figuerres
???

about 25% of the time for me.  random. last 2-3 days.

 

almost like WIn 7 launch jinxed C9 with new glitches.

Bas
Bas
It finds lightbulbs.

Getting gradually better, only happens once in a while now for me.

Charles
Charles
Welcome Change

We have a faulty node in our web farm. Not sure of the cause yet. Sorry for the inconvenience.

C

Bas
Bas
It finds lightbulbs.

Thanks Charles.

Would be interesting to know how does Azure handle it if one node returns different results due some intermittent hardware issue or something. Or do the cloud apps all run on atleast two physical nodes simultaneously and the results/state is constantly compared?

 

stevo_
stevo_
Human after all

Issues that cause corrupted output are highly unlikely.. other than that, general system health is constantly recorded.. if a node becomes unhealthy then a new node is brought online to replace it.

Sven Groot
Sven Groot
My name has 9 letters. Coincidence? I think not...

If you have enough nodes, highly unlikely means it'll happen next Tuesday.

 

In fact, in large scale cluster systems, it's a fact of life that most failures don't mean "the node dies" but instead "this piece of data becomes corrupted". It's one of the hardest things in managing reliability in a large cluster environment.

stevo_
stevo_
Human after all

I highly doubt a node is likely to keep the application functioning but somehow managing to make its output corrupted.. thats like selective corruption.. just enough to completely screw the apps output, but not enough to make the app destabilize and crash.

 

Data corruption is much more likely to happen when routing the data, such as a router going dodge city, but I expect the amount of cash they pump into such hardware will make it unlikely, and additionally, such problems aren't intrinsic to cloud hosting but basically any networking.

 

It is much more likely that a node will become faulty, and apps start to crash, in which case the crashes will be seen by the health monitoring system, and handled.

page 1 of 1
Comments: 11 | Views: 98
Microsoft Communities