Is it just me, or has Github's quality of service been continually degrading over the past several months? What is going on internally? Is this because of the Microsoft acquisition? Increased usage? An internal transition to Azure?
It might also be covid related. People are working from home, people responsible for system upkeep might not be immediately responsive, more demand on the servers for whatever reason, etc.
I would agree that the coronavirus could be a factor here. At the same time, I've been noticing issues since probably December or January (before the coronavirus started being a real problem), which makes it seem like maybe there are multiple issues.
Of course, I'm not actually internal to Microsoft or Github, so I have no idea and it's all opaque to me.
Alternative explanation - they've been deploying big new features with some regular cadence lately. New features carry risk and I think we're seeing that.
Maybe it's the demand side? With remote work, the intensity of usage went up at least in our company as we rely more on written communication. At the same time, some people will use the time to start side projects or get into programming.
On the other hand, shouldn't there be a productivity drop with so many people working from home while their kids also aren't in school? I'd expect that to offset any increase in demand—after all, Git isn't Slack; remote work shouldn't cause people to push all that much more often, right?
Personnel may be partially unavailable, but - home or office - developers are doing the same job as they always did. At least from the users point of view it didn't change that much.
Ideally GitHub Actions would be completely independent of GitHub's core services/servers (e.g. how Travis CI, Circle CI, etc. works), but that seems like it may not be the case.
Also, I'm still anticipating a fuller report on the database issues they mentioned have been the root cause of many outages over the past few months.
I prefer to pay 8$ per month for a stable service any time over 4$ per month over a service that fails during a critical build, just like it happened to me this time.
If you view the historical uptime, it does seem like there have been more incidents in the past three months, but otherwise the waters look calm (as reported at least): https://www.githubstatus.com/uptime?page=1
Yes, and so this is actually a thing that bugs me, because I use Github every day. Over the past several months, there have been numerous days in which I've had problems (`You can not comment at this time`, 500s, etc.) and no corresponding status report.
It seems like the historical uptime page paints a far rosier picture than I am actually experiencing.
I wonder how companies like github decide to determine this when outages are geospecific. Do they not report until an outage is affecting 50% of a geographic region before its reported as a partial outage?
If you do nothing, it lands by default on the "git operations" view, which is by far the most stable, since, well, it consists of executing the battle-tested git program.
If you want to see more the state of github "extras", you'd need to select "github actions" or "webhooks", which have a fair amount of downtime (about once a week or so, which seems about right).
Interesting how the most stable component of the company is the open source one of course ^^
GitHub hasn't collapsed killing thousands or needing to be completely rebuilt though, so that analogy doesn't work. This is more like there's a flood in the lobby so maintenance has closed the front door for a bit.
I didn't mean for the point to be about the consequences of the failure. What I was trying to argue against was the notion that it's fine for things to fail, just by virtue of them being hard. There are a lot of complicated systems in the world that work extremely reliably.
It's not a good thing that Github is down. It's an inevitable thing that comes from complexity at scale though. Hard things are hard, whether that's planes, buildings, or web apps.
Internal to Amazon we consider that about 80% of issues/outages/etc are due to changes. This may sound "duh" but this is over 10k plus investigations.
Much if the work is just minimizing the impact of this changes by finding them before customers do.
This includes things like unit, integration testing, canaries, cellular/ zonal / regional deploys, auto rollbacks, multi-hour bakes, auto load tests, and much much monitoring. Not to mention cross team code reviews, game days, ops reviews.
No, I think it's more part of how to run a complex system with a lot of people changing stuff at once. Having good monitoring, kill switches, staged rollout, continuous deployment, and so on are all things that contribute more making a reliable service than how microserviced it is.
If you're looking for somewhere else, SourceHut has had no unplanned outages in 2020, despite being kept online by an army of one. The software and infrastructure is just more resilient and better maintained. Our ops guide is available here:
It's unfair to Github to make the claim that your infra is more resilient and better maintained. Their load is orders of magnitude greater than yours. My driveway also doesn't have potholes it doesn't mean it's more resilient than the freeway.
SourceHut is at least 10x lighter weight and has a distributed, fault tolerant design which would allow you to continue being productive even in the event of a total outage of all SourceHut services.
Sidebar, I just want to say, you are one of the few people I’ve observed doing actual “modern” web development.
When most people talk about “modern web” or modern anything in software they think it means “using all the latest tools”.
That often means things like ES6 and Webpack, which have nice surfaces, but which create nightmares under the hood.
That’s the opposite of what modern architecture was. It was about embracing the constraints of materials. Given the properties of concrete, what is the limit of what you can do with it. Go there, and no further. And don’t cover it up, just finish the dang slab and get on with the rest of the house.
ES6 means transpiling, which means webpack, which means a massive machine of hidden complexity, which if you’re lucky exposes a nice smooth surface where everything is arrow functions and named exports. And if you’re unlucky is a flimsy piece of cardboard over the nightmare underneath.
You (SourceHut) seem to be building a UI that actually takes note of how the browser is. And you are trying to push the big numbers... how reliable your service can be, how many endpoints can one person maintain, while letting the materials of the web (forms, urls) dictate the details.
That’s true modernism.
So, bravo. I’m glad to see you out in the world. It takes courage to step outside of the norm and I’m rooting for you.
Who's to say? It's not GitHub scale, and even if everyone in this thread moved to SourceHut, it still wouldn't be GitHub scale, but it would be serving your needs just fine. I feel totally comfortable recommending SourceHut over GitHub as a service which can be expected to have better uptime and performance, because it is a fact - even if we operate at different scales.
And I believe sr.ht would beat out GitHub at their scale anyway. The services are an order of magnitude more lightweight. And the design is more fault tolerant: we use a distributed architecture, so one part of the system can go down without affecting anything else - as if GitHub's issues could go down without anything else being affected. And many of our tools are based on email, a global fault-tolerant system, which would allow you to get your work done more or less unaffected even if SourceHut was experiencing a total outage. We'd automatically get caught back up with what you were up to in the meanwhile once we're online, too.
I've spoken to GitHub engineers about some of the internal architectural design of GitHub, too, I'm confident that SourceHut's technical design beats out GitHub's in terms of scalability. And, despite already winning by a good margin, I'm still spending a lot of effort to push the envelope further on performance and scalability.
And then you go on to say it. I'm glad that SourceHut exists, and I like many of its principles, and it's probably better designed too, but walking into a thread where someone is having an outage and then claiming that you'd do much better is in poor taste no matter how good you are or how many of your services work offline.
Right, and I think it is great to bring up how your service can handle outages better than GitHub would due to it being decentralized. The part I have issue with is saying that you'd do better than GitHub about keeping your site up, pointing to the issue that they are in the middle of resolving–that just seems like kicking them while they're down, especially since you haven't actually shown that you can do better. (Yes, you have good uptime in the past, but I don't see what's stopping the power going out to some of your servers, or you pushing a bug into production, or any number of other things that shouldn't go wrong but often do, especially as the number of users increases.)
>what's stopping the power going out to some of your servers
Redundant power supplies
>pushing a bug into production
Nothing, but again, SourceHut is demonstrably better in this regard: because it's distributed, a bug in production would only affect a small subset of our system, and the system knows how to repair itself once the bug is fixed.
And I don't think I need to apologise for kicking Golaith while he's down. Someone said they want alternatives, so I pitched mine with specific details of how it's better in this situation, and that doesn't seem wrong to me. I would invite my competitors to do the same to me. We should be fostering a culture of reliability and good engineering - and if I didn't hold my competitors accountable, who will? "Here's an alternative" has more teeth than "I wish this was better."
> SourceHut over GitHub as a service which can be expected to have better uptime and performance, because it is a fact
Most of us could throw any of the open source solutions on a $20 Linode instance and probably have excellent uptime. How many active repos do you host, and on how many servers?
About 18K git & hg repositories, for about 13.5K users. We also run about 5,000 CI jobs per week, including for some large projects like Nim and Zig, Neovim, OpenSMTPD, etc. We have 10 dedicated servers at the moment. And I didn't throw an open source solution on these servers - I built these open source services from the ground up.
SourceHut is not the same scale as GitHub. This does not change the fact that SourceHut is faster and more reliable. We have an advantage - fewer users and repos - but still, that doesn't change the fact that we're faster and more reliable.
This has been objectively demonstrated as a numerical fact:
And yes, 9 of those servers are in Philadelphia (the other is in San Franscisco, but it's for backups, not distribution). That doesn't change the fact that, despite being more distant from many users, our pages load faster. In this respect, we have a disadvantage from GitHub, but we're still faster.
GitHub and Sourcehut are working at different scales. That doesn't change the fact that SourceHut is faster.
I wasn't questioning that some of the web features are fast. I'm sure when Github was 10 servers their pages were fast too. I suspect if I threw Gitlab on a 9-server cluster on AWS they'd also be quick.
Not geographically distributed, but distributed in the sense that different responsibilities of the overall application are distributed among different servers, which can fail independently without affecting the rest. Additionally, the mail system on which many parts of SourceHut relies is distributed in the geographical sense, among the hundreds of thousands of mail servers around the world which have standard and 50-year-battle-tested queueing and redelivery mechanisms built in.
And yes, throwing GitLab on a 9 server cluster on AWS might be fast. But, I'm ready to bet you that SourceHut will be faster than it still, and I have a ready-to-roll performance test suite to prove it. And I know that SourceHut is faster than GitLab.com and GitHub.com, and every other major host, and you don't have to go through the trouble of provisioning your own servers to take advantage of SourceHut's superior performance.
> This has been objectively demonstrated as a numerical fact:
While your tests are indeed objective, I don't think they're very useful. For example, why does your performance test ignore caching?
GitHub's summary page loads 27KiB of data for me unauthenticated, which is about 6% of the 452KiB you're displaying in your first table. The vast majority of developers who browse GitHub will not be loading 452KiB of static assets every single page load.
Anecdotally, GitHub's "pjax" navigation feels about as fast as SourceHut on my aging hardware.
Even with caching, SourceHut is a lot smaller than that. SourceHut benefits from caching, too - the repo summary page comes from 2 requests and 29.5K to 1 request and 5.7K with a warm cache. And in many cases, the cache isn't the bottleneck, either - dig into the Lighthouse results for specific pages to see a more detailed breakdown.
Thanks for being so transparent about your operations.
Maybe this is somewhere in the manual and I missed it, but do you have some way of automating the configuration of your hosts and VMs? For example, do you use something like Ansible?
No, I provision them manually. Being based on Alpine Linux makes this less time-consuming and more deterministic. At some point I might invest in something completely automated, but right now the manual approach is simpler - and if it's not broke, don't fix it.
Ah, OK. Also, have you written anywhere about why you chose to use colocation rather than VPS (or "cloud") hosting, or leased dedicated hosting for the CI system? If you could use someone else's hardware rather than having to select, buy, and set up your own, then at least in theory, you could spend more time on other things. But I'm sure you have your reasons for making the choice that you did. I'm just curious about what those reasons are, if you're inclined to share.
There are lots of reasons, but the most obvious one is cost. All of SourceHut's servers are purpose-built for a particular role, and their hardware is tuned to that. The server that git.sr.ht runs on is pretty beefy- it cost me $5.5K to build. I paid that once and now the server belongs to us forever. I ran the same specs through the AWS price estimator, and it would have cost ten grand per month.
No, write access is only supported over SSH, for security reasons. SSH key authentication is stronger than password authentication, and git.sr.ht doesn't have access to your password hash to check anyway.
I set up a self-hosted Gitea this year and moved my repos over and couldn't be happier with it. It's faster than GitHub, clones the GitHub design/UI so that everything's where I expect it to be, has a dark mode, and supports U2F. It's easy to deploy, back up, and maintain, the Gitea devs have done a great job.
It's much less complicated (both from an admin standpoint, as well as a UI standpoint) than GitLab. I paired it with a Drone installation (also self-hosted) for CI and (sometimes) CD.
It all works great, and is way easier than I thought. If there's downtime, I'm (usually) in control of when or how long, as I have root on the box.
I'm also not giving my money to a giant military contractor (Microsoft, the owners of GitHub) any longer, which is a huge deal for me from a personal moral standpoint (YMMV).
Positive side for such downtime, turns out people gather to look at your landing/home page now just to see if the service is up. Can the cost of a downtime be effective to grab few customers for your new feature just published on your homepage?
you could move to gitlab, but from what im hearing the pricing is higher than github (is this still true?)
Barring that you always have the tried and true (and for some reason abhorred by start-ups) option of running your own gitea or gitlab instance. Its not hard, and most of this stuff can be done in dockerless containers if you want.
If cloud servers are getting "overloaded" as some commenters say, you could even buy a few racks or u's of colo somewhere or use a cloud provider that isnt the most popular meme on YC. Vultr and ramnode are both good options and youd be supporting a small business, not Bezos next giga-yacht.
>Vultr and ramnode are both good options and youd be supporting a small business, not Bezos next giga-yacht.
vultr is considered a small business now? crunchbase lists them as having 50-100 employees, and they seem to be owned by Choopa, LLC which some sources list as having 150 employees.
1) Microsoft took over
2) M$ migrates some ADO (Azure DevOps) features to Github (e.g., Github Actions)
3) If Github was not on Azure before M$ bought it (very likely, but needs citation) they will probably migrate to Azure at some point
I'm pretty sure Github Actions work predates the MS acquisition... I'm also pretty sure that they are trying to align the backend systems more to Azure, but have no insight into how much of that took place.
The fact that you used "M$" indicates that you are predisposed to blame Microsoft for actions that are likely not from the parent, and discount any changes from the top down that have occurred within MS. And while I have a lot of issues with MS and Windows in particular, MS today is not the same as MS even a decade ago.
I would guess that since they introduced free private repos, usage has been increased a lot.
Eg. I used to use bitbucket but switched over to github when they did that, cus the github desktop program is nice and works a lot more smoothly with gtihub as opposed to bitbucket.
...is it time to move away from Github?