Google BigQuery: all about the Big Data Cloud platform
Google BigQuery is a Big Data analysis platform offered by Google via the Cloud. Find out everything you need to know about this virtual Data Warehouse: definition, operation, advantages…
Storage and “querying” of massive data sets can be both expensive and time-consuming for companies that do not have adequate infrastructure and hardware. This is the reason why Google offers its Google BigQuery platform .
Google BigQuery: what is it and what is it for?
Google BigQuery is a Data Warehouse designed to allow businesses to perform SQL queries very quickly thanks to the processing power of Google’s cloud infrastructure . Thus, it is part of the Infrastructure family as Cloud Services (IaaS) . Designed for Big Data, this platform can analyze billions of lines of data.
Directly deployed in V2 in 2011 , BigQuery is actually an “outsourced version” of the Dremel query software used internally by the firm to track device installation data, create crash reports or analyze spam. What the two platforms have in common is that they use column storage to quickly scan data, and a tree-like architecture to dispatch requests and aggregate results between large clusters of computers.
Following its external launch, BigQuery has expanded with many features . Since 2013, data joins, time stamping and the possibility of inserting data streams have been added to the service.
Google BigQuery: how does it work?
Just transfer the data to BigQuery to take advantage of the power of Google’s infrastructure. The service is fully managed , which means you don’t deploy resources like disks or virtual machines to start using it.
The service also integrates many tools from Google or third-party companies such as Google Analytics 360, Talend, Informatica, Tableau sofware, Qlik, or Data Studio. It is possible to transfer data from multiple sources like Google Analytics, Firebase, Google Sheets, or other ETLs like Talend and Traffika. You can therefore centralize all your raw data in the Cloud.
The main components of Google BigQuery
To access BigQuery, you can use the GCP console or the web interface . It is possible to access it via a command-line type tool, or by calling the BigQuery REST API via a variety of client libraries such as Java, .NET or Python. There are also a variety of third-party tools that allow you to interact with the platform, for example to view data or to load it.
BigQuery is based on two main components: Dremel and Borg . Google presents Dremel, the query engine, as a “massively parallel query cloud service”. Based on a file management system, Dremel allows you to translate SQL queries into lower level instructions for the engine.
The second component, Borg, is Google’s large-scale cluster management system . This makes it possible to automatically assign the computing and storage resources of the servers to individual tasks, rather than having to do it manually.
If you want to learn to use BigQuery, know that Google offers many tutorials in French on its official site . You can consult them at this address . In addition, you can find video tutorials in English at the end of this article to learn how to report with the service or to learn how to use Firebase Analytics.
Google BigQuery vs Amazon Redshift and Microsoft Azure SQL Data
bigquery vs redshift
Obviously, BigQuery is not the only virtual Data Warehouse in the Cloud. Google’s main competitors in the cloud computing market, Amazon and Microsoft, also offer similar services: Amazon Redshift and Microsoft Azure SQL Data . These platforms allow the database administrator to ingest data, assign storage and compute resources, and integrate with other Business Intelligence tools.
However, Google’s Data Warehouse is doing well by automating data formatting and resource provisioning . The platform is also responsible for maintenance operations. The user just connects the data sources and runs queries.
This platform is therefore easier to use than its competitors. However, in terms of performance, BigQuery cannot compete with systems like Amazon Redshift .
Google BigQuery: what are the prices of the service?
Now that you know the features of BigQuery, you probably want to know its pricing . Note that the storage costs of the platform depend only on the volume of data stored.
Google distinguishes between active storage and long-term storage . The firm invoices the active strocking on a monthly basis according to the data stored in tables and modified during the last 90 days. Longer-term storage is for tables that have not been changed in the past 90 days.
Regarding active storage, 10GB are offered each month and then each additional GB used costs $ 0.020 per month. Long-term storage is billed at $ 0.010 per month after the first 10 GB offered each month. Regarding the requests