電腦領域 HKEPC Hardware 's Archiver

toylet 發表於 2020-12-14 21:22

[on.cc東網]Google全球死機 用戶無法使用Gmail、YouTube...

**** 作者被禁止或刪除 內容自動屏蔽 ****

chue 發表於 2020-12-14 22:06

好小事啫

dadadede 發表於 2020-12-14 22:24

唔怪得頭先上唔到, 彈依句: something went wrong... youtube, 唔login就上到!

TH30 發表於 2020-12-14 22:35

如果日頭死gmail同gdrive,班work from home就可以去行山了 :d:

toylet 發表於 2020-12-14 23:27

**** 作者被禁止或刪除 內容自動屏蔽 ****

kwliu 發表於 2020-12-14 23:33

[quote]如果日頭死gmail同gdrive,班work from home就可以去行山了
[size=2][color=#999999]TH30 發表於 2020-12-14 22:35[/color] [url=https://www.hkepc.com/forum/redirect.php?goto=findpost&pid=40010156&ptid=2598181][img]//i.hkepc.net/forum/common/back.gif[/img][/url][/size][/quote]

叫公司轉O365 email同one drive:d:

TH30 發表於 2020-12-15 08:15

[quote]叫公司轉O365 email同one drive
[size=2][color=#999999]kwliu 發表於 2020-12-14 11:33 PM[/color] [url=https://www.hkepc.com/forum/redirect.php?goto=findpost&pid=40010272&ptid=2598181][img]//i.hkepc.net/forum/common/back.gif[/img][/url][/size][/quote]

用開轉好煩

rabbit82047 發表於 2020-12-15 08:27

[b]回覆 [url=https://www.hkepc.com/forum/redirect.php?goto=findpost&pid=40010156&ptid=2598181]4#[/url] [i]TH30[/i] [/b]


    你估夜晚唔會收到 call :naug:

peter_chan 發表於 2020-12-15 10:15

[quote]叫公司轉O365 email同one drive
[size=2][color=#999999]kwliu 發表於 2020-12-14 23:33[/color] [url=https://www.hkepc.com/forum/redirect.php?goto=findpost&pid=40010272&ptid=2598181][img]//i.hkepc.net/forum/common/back.gif[/img][/url][/size][/quote]


    可能之前用M$出事﹐所以轉用google。

點知google 都係出事:titter:


[url]https://3c.yipee.cc/172641/microsoft-outlook%E3%80%81office-365-%E5%A4%A7%E7%95%B6%E6%A9%9F-5-%E5%B0%8F%E6%99%82%EF%BC%8C%E8%BF%84%E4%BB%8A%E9%82%84%E5%9C%A8%E4%BF%AE%E5%BE%A9%E4%B8%AD/[/url]

waishingme 發表於 2020-12-20 07:46

[i=s] 本帖最後由 waishingme 於 2020-12-20 07:47 編輯 [/i]

[url]https://status.cloud.google.com/incident/zall/20013#20013004[/url]

ISSUE SUMMARY
On Monday 14 December, 2020, for a duration of 47 minutes, customer-facing Google services that required Google OAuth access were unavailable. Cloud Service accounts used by GCP workloads were not impacted and continued to function. We apologize to our customers whose services or businesses were impacted during this incident, and we are taking immediate steps to improve the platform’s performance and availability.

ROOT CAUSE
The Google User ID Service maintains a unique identifier for every account and handles authentication credentials for OAuth tokens and cookies. It stores account data in a distributed database, which uses Paxos protocols to coordinate updates. For security reasons, this service will reject requests when it detects outdated data.

Google uses an evolving suite of automation tools to manage the quota of various resources allocated for services. As part of an ongoing migration of the User ID Service to a new quota system, a change was made in October to register the User ID Service with the new quota system, but parts of the previous quota system were left in place which incorrectly reported the usage for the User ID Service as 0. An existing grace period on enforcing quota restrictions delayed the impact, which eventually expired, triggering automated quota systems to decrease the quota allowed for the User ID service and triggering this incident. Existing safety checks exist to prevent many unintended quota changes, but at the time they did not cover the scenario of zero reported load for a single service:

• Quota changes to large number of users, since only a single group was the target of the change,

• Lowering quota below usage, since the reported usage was inaccurately being reported as zero,

• Excessive quota reduction to storage systems, since no alert fired during the grace period,

• Low quota, since the difference between usage and quota exceeded the protection limit.

As a result, the quota for the account database was reduced, which prevented the Paxos leader from writing. Shortly after, the majority of read operations became outdated which resulted in errors on authentication lookups.

REMEDIATION AND PREVENTION
The scope of the problem was immediately clear as the new quotas took effect. This was detected by automated alerts for capacity at 2020-12-14 03:43 US/Pacific, and for errors with the User ID Service starting at 03:46, which paged Google Engineers at 03:48 within one minute of customer impact. At 04:08 the root cause and a potential fix were identified, which led to disabling the quota enforcement in one datacenter at 04:22. This quickly improved the situation, and at 04:27 the same mitigation was applied to all datacenters, which returned error rates to normal levels by 04:33. As outlined below, some user services took longer to fully recover.

In addition to fixing the underlying cause, we will be implementing changes to prevent, reduce the impact of, and better communicate about this type of failure in several ways:

1. Review our quota management automation to prevent fast implementation of global changes

2. Improve monitoring and alerting to catch incorrect configurations sooner

3. Improve reliability of tools and procedures for posting external communications during outages that affect internal tools

4. Evaluate and implement improved write failure resilience into our User ID service database

5. Improve resilience of GCP Services to more strictly limit the impact to the data plane during User ID Service failures

We would like to apologize for the scope of impact that this incident had on our customers and their businesses. We take any incident that affects the availability and reliability of our customers extremely seriously, particularly incidents which span multiple regions. We are conducting a thorough investigation of the incident and will be making the changes which result from that investigation our top priority in Google Engineering.

DETAILED DESCRIPTION OF IMPACT
On Monday 14 December, 2020 from 03:46 to 04:33 US/Pacific, credential issuance and account metadata lookups for all Google user accounts failed. As a result, we could not verify that user requests were authenticated and served 5xx errors on virtually all authenticated traffic. The majority of authenticated services experienced similar control plane impact: elevated error rates across all Google Cloud Platform and Google Workspace APIs and Consoles. Products continued to deliver service normally during the incident except where specifically called out below. Most services recovered automatically within a short period of time after the main issue ended at 04:33. Some services had unique or lingering impact, which is detailed below.

Cloud Console
Any users who hadn't already previously authenticated to Cloud Console were unable to login. Users who had already authenticated may have been able to use Cloud Console but may have seen some features degraded.

Google BigQuery
During the incident, streaming requests returned ~75% errors, while BigQuery jobs returned ~10% errors on average globally.

Google Cloud Storage
Approximately 15% of requests to Google Cloud Storage (GCS) were impacted during the outage, specifically those using OAuth, HMAC or email authentication. After 2020-12-14 04:31 US/Pacific, the majority of impact was resolved, however, there was lingering impact, for <1% of clients that attempted to finalize resumable uploads that started during the window. These uploads were left in a non-resumable state; the error code GCS returned was retryable, but subsequent retries were unable to make progress, leaving these objects unfinalized.

Google Cloud Networking
The networking control plane continued to see elevated error rates on operations until it fully recovered at 2020-12-14 05:21 US/Pacific. Only operations that made modifications to the data plane VPC network were impacted. All existing configurations in the data plane remained operational.

Google Kubernetes Engine
During the incident, ~4% of requests to the GKE control plane API failed, and nearly all Google-managed and customer workloads could not report metrics to Cloud Monitoring.

We believe ~5% of requests to Kubernetes control planes failed but do not have accurate measures due to unreported Cloud Monitoring metrics.

For up to an hour after the outage, ~1.9% nodes reported conditions such as StartGracePeriod or NetworkUnavailable which may have had an impact on user workloads.

Google Workspace
All Google Workspace services rely on Google's account infrastructure for login, authentication, and enforcement of access control on resources (e.g. documents, Calendar events, Gmail messages). As a consequence, all authenticated Google Workspace apps were down for the duration of the incident. After the issue was mitigated at 2020-12-14 04:32 US/Pacific, Google Workspace apps recovered, and most services were fully recovered by 05:00. Some services, including Google Calendar and Google Workspace Admin Console, served errors up to 05:21 due to a traffic spike following initial recovery. Some Gmail users experienced errors for up to an hour after recovery due to caching of errors from identity services.

Cloud Support
Cloud Support's internal tools were impacted, which delayed our ability to share outage communications with customers on the Google Cloud Platform and Google Workspace Status Dashboards. Customers were unable to create or view cases in the Cloud Console. We were able to update customers at 2020-12-14 05:34 US/Pacific after the impact had ended.

頁: [1]

Powered by Discuz! Archiver 7.2  © 2001-2009 Comsenz Inc.