Tech Talk Corner Sign up with your email address to be the first to know about new publications

Export Parquet Files with Column Names with Spaces

In this blog post, I’ll show you a design pattern to export parquet files with column names with spaces. You can use Azure Data Factory or Azure Synapse Analytics to overcome this problem. I will use dynamic T-SQL  

Post Contents: 

  • Introduction 
  • Pre-requisites 
  • Prepare the Dynamic T-SQL Code 
  • Exporting parquet with blank spaces 

Introduction 

To begin, one of the limitations when exporting data to parquet files in Azure Synapse Analytics or Azure Data Factory is you can’t export tables that have columns with blank spaces in their names.  

One of my readers, Marcus, asked me about how to do this recently, so I thought I’d write about it. (Leave me a comment if you ever have any questions, and your answer may turn into a helpful blog post!) 

For example, the following table cannot be exported with column header because the names have spaces. 

Table that cannot be exported

The solution is to rename the column names when querying the data while minimizing the development effort. 

If you try to do it, you get the following error: 

"message": "Failure happened on 'Sink' side. ErrorCode=ParquetInvalidColumnName,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The column name is invalid. Column name cannot contain these character:[,;{}()\\n\\t=],Source=Microsoft.DataTransfer.Common,'", 
Error message

Pre-requisites (recommended) 

First, have a look at this blog post on how to export parquet files in Azure to understand the solution. 

Prepare the Dynamic T-SQL Code 

In this example, I’ll use Azure SQL Database as the source system. If you are using any Relational Database Management System (RDBMS), the design pattern will be the same. Just change the dynamic queries. 

The main idea is to generate a dynamic query that modifies the column names to remove the spaces. If you haven’t worked with Dynamic SQL before, I suggest learning about it. I learned this in my early years from my mentor Enrique Viamontes.  

In T-SQL, you can use: 

DECLARE @s VARCHAR(500) 

DECLARE @tablename VARCHAR(500) ='DimAccount' 

 

SELECT @s = ISNULL(@s+', ','') + '[' + c.name + ']' + ' as ['+replace(c.name,' ','')+'] ' 

FROM sys.all_columns c join sys.tables t 

ON c.object_id = t.object_id 

WHERE t.name = @tablename 

 

SELECT 'select ' + @s + ' from ' + @tablename as Query 
T SQL

Export Parquet Files with Column Names with Spaces 

Now, let’s include the code in an integration pipeline (Azure Data Factory or Synapse Analytics) using a Lookup Activity. 

In your ForEachTable, add a lookup activity as follows and click the query. 

Export parquet files with columns with spaces

Paste the following query: 

@concat(' 

DECLARE @s VARCHAR(500) 

DECLARE @tablename VARCHAR(500) = ''', item().Table_Name ,''' 

 

SELECT @s = ISNULL(@s + '', '','''') + ''['' + c.name + '']'' + '' as [''+replace(c.name,'' '','''')+''] '' 

FROM sys.all_columns c join sys.tables t 

ON c.object_id = t.object_id 

WHERE t.name =  @tablename 

 

SELECT ''select '' + @s + '' from '' +  @tablename  as Query') 

Now, modify the copy activity source query. 

Modify the copy activity query

Paste the following code. 

@activity('RemoveBlankSpacesColumnNames').output.firstRow.Query 

Now, the copy activity will export the data correctly by renaming the column names. 

Columns are renamed correctly

Summary  

Today you have learned how to overcome one of the limitations when working with parquet files.  

What’s Next?      

In upcoming blog posts, we’ll continue to explore Azure Data Service features.          

Please  follow Tech Talk Corner on Twitter for blog updates, virtual presentations, and more!                

Check out these other blog posts:

comment [ 0 ]
share
No tags 0

No Comments Yet.

Do you want to leave a comment?

Your email address will not be published. Required fields are marked *