Data Virtualization with MongoDB using PolyBase in SQL Server 2019

Data Virtualization with MongoDB using PolyBase in SQL Server 2019

cupom com desconto - o melhor site de cupom de desconto cupomcomdesconto.com.br



In this article, you’ll learn the approach to integrate MongoDB data source using data virtualization technique in
SQL Server 2019. In this article, you can see how SQL Server 2019 provides a platform to create a modern enterprise
data hub using data virtualization technology and the PolyBase technique.


  1. Discuss Data Virtualization

  2. Pre-requisite to setup MongoDB

  3. Set up MongoDB external connection in SQL Server 2019

  4. Many more…

Introduction


The advent of Data virtualization in SQL Server 2019 allows us to solve modern and complex data challenges. Data
virtualization with PolyBase in SQL Server 2019 is used as a data hub, and you can directly query the data from
several heterogeneous data sources. These data sources include Azure Managed Instance, Oracle, Teradata, SAP HANA,
MongoDB, Hadoop clusters, Cosmos DB, and SQL Server. We can query the data source using T-SQL and without separately
installing driver software.


The data virtualization in SQL Server 2019 is an improvised solution to the ETL process. The other advantage of Data
virtualization is that it allows the integration of data from different sources such as Azure MI, SQL Server,
MongoDB, Oracle, DB2, Cosmos DB, and Hadoop-Distributed-File-System (HDFS) without the much data movement around the
source and destination. This process is possible with the advent of PolyBase connectors.

    Note:

  • Using T-SQL, we can query heterogeneous data sources using PolyBase connectors. This provides the bridge to query the data from external data sources such as SQL Server, Oracle, Teradata, MongoDB, and ODBC data sources with external tables
  • It also supports the UTF-8 encoding format

Get started:


In this section, you will learn how to create secure data access from the underlying data source.


In this case, PolyBase uses the security model of the MongoDB model to access the data. In most cases, we need
permission to read the data. However, the credentials used to read the data and it is stored inside the PolyBase
data hub.


To set-up data virtualization, follow the below steps:


  1. Setup database master key

  2. Create database-scoped credentials

  3. PolyBase external tables

  4. Configure External data sources


To configure database virtualization, select the database. Right-click the database and select Create External Table that starts the data virtualization wizard. 


  1. Select a data source

  2. Create a database master key


In this section, we will see how to create a database master key. The master key is created inside the SQL Server
database and it acts as a data hub.


The master key is providing a secure way to read data using the credentials in the external data source. It is always recommended to choose a complex password for the master key. In addition, use the BACKUP MASTER KEY command to back up the master key.

    Note:


  1. You can have one master key in a database instance

  2. Use a secure location to backup the master key

  3. If you already have the master key created in your SQL instance—re-use it. The same master key is used to secure
    data access for data virtualization
  4. To see whether the master key is already created run the following query

    Display Sysmetric_keys details


Syntax to create a database master key is as follows:

Database scoped credentials


Database scoped credentials provide the abstraction layer to secure the model to access the data from the data
source. SQL Server 2019 supports the following security models when using PolyBase for data virtualization:

  1. Proxy authentication
  2. Kerberos authentication
  3. Storage access key—shared secret key


The following syntax is used to provide SQL Server with the security context for a connection to a MongoDB external
data source. In addition, you will see a shared secret key in use to create a database scoped credential.

  • Note: The identity refers to the username of the MongoDB external data source


To validate database scoped credentials in the datahub run the following T-SQL:


Display database scope credentials

Configure External data sources


External data sources provide a path to virtualize data from other data sources. It collects all the meta-data to
access data from the single database object.


Each external data source provides the below options required to access the external source:


  • Location

  • port

  • credential

  • connectivity


The LOCATION parameter defines the communication protocol, server name and port for connectivity.


In this case, the communication protocol is mongodb and server is 127.0.0.1(Localhost) and port is 27017


The CREDENTIAL argument uses the database scoped credential MyMongoDBDemo2019. The external data
source will use this database-scoped credential to query the target data sources

  • Note: The Connection_options and Pushdown parameters will be explained later in the article


  • To validate external data sources run the following T-SQL:


    Display external_data_sources

    How to map an external data table


    In this section, we will see how to create external views. On selecting the parent tables, the child tables are also
    referred by default. In addition, you have the mapping table where you modify the columns to match the target data
    types. This also gives you the option to modify the name of the external table.


    Let us look at the details of the MongoDB instance. Create the user using the following commands. I will not go
    in-depth on MongoDB concepts.


    how to connect mongoDB

    PolyBase external tables


    External tables provide SQL Server with the schema to reason over data in the external data source.


    Error Message:


    The T-SQL create external table statement errors out with the following details. You can see that the TLS handshake
    failed error reported. This can be handled in two ways. You can manage it at the Mongo DB startup parameters or you
    can turn off the SSL parameter or you can turn off the SSL=false in the T-SQL statement.

    “Msg 105082, Level 16, State 1, Line 16


    105082;Generic ODBC error: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client:


    No suitable servers found (serverSelectionTryOnce set):


    [TLS handshake failed: error:00000000:lib(0):func(0):reason(0) calling ismaster on ‘127.0.0.1:27017’]


    (Error Code: 13053) Additional error <2>:


    ErrorMsg: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: No suitable servers found
    (serverSelectionTryOnce set)



    : [TLS handshake failed: error:00000000:lib(0):func(0):reason(0) calling ismaster on ‘127.0.0.1:27017’]


    (Error Code: 13053), SqlState: HY000, NativeError: 110 “


    You may receive this error due to the SSL issue. Now, the CONNECTION_OPTIONS parameter plays a role to connect MongoDB data source.


    In this case, set the connection_options to ssl=false and then run the create external statement.


    The CONNECTION_OPTION setting suppresses the connection over SSL, which is a default requirement for communicating
    with many other data sources. I will discuss more about it in the next article.


    Let us run create external table statement again:


    We can see in the below snippet, the external table created successfully.


    Create mongoDB external table



    Let us validate the external table using the following T-SQL:


    Display the external tables


    Now, Let us browse object explorer to validate the existence of the External Tables and External Data Sources.


    Display external employee table



    Finally, query the external table


    MongoDB data validation in SSMS and MongoDB console

    Summary


    In this article, we discussed how to set up a PolyBase in SQL Server for MongoDB external data source.


    To summarize the steps real quick:


    The data virtualization framework provides a platform to eliminate the use of redundant data copies. To maintain the
    physical copy always requires additional storage and hardware resources. It incurs to increase an additional
    overhead on costs. Using the data virtualization technique, organizations can create an abstract layer to create a
    virtual data copy from a heterogeneous or homogeneous data repositories. The virtual copies define the soft external
    table links to the physical data repository. As I mentioned, the data virtualization technique not only avoids extra
    storage but it provides strong consistency between the source and the target in the data retrieval process. In
    addition, it addresses the data issues which in turn improves the data quality and improves the data query
    performance.


    That’s all for now. I hope you like this article. Stay tuned for more updates…






    Prashanth Jayaram
    Latest posts by Prashanth Jayaram (see all)


cupom com desconto - o melhor site de cupom de desconto cupomcomdesconto.com.br
Leia Também  Métodos de introdução e prevenção no SQL Server