Image for post
Image for post
SQL Server Big Data Cluster (image source)

Before SQL Server 2012 release, this product was considered a database management system for small and medium enterprises. Starting with the 2012 release, the database engine is no longer considered for medium-scale enterprises after adding high-end data-center management capabilities. In November 2019, SQL Server 2019 Big Data Clusters were introduced, giving the ability for users to build a Big Data ecosystem.

This article will briefly mention nine features added starting SQL Server 2008 that make SQL Server more than a traditional database management system.

Data Compression (SQL Server 2008+)

This feature was added in SQL Server 2008 to be applied to tables and indexes. …


From many years, “Big Data” has become widespread and trendy. The Big Data technologies started to fill the gap between the traditional data technologies (RDBMS, File systems … ) and the high evolution of the data and business needs.

While implementing these technologies is a must for many large-scale organization to ensure the business continuity, many organization are aiming to adopt these technologies without really knowing if they can improve their business.

Before making your decision, there are many things you should take into consideration.

Knowing what is Big Data

Before asking if your business needs Big Data technologies, you have first to know what…


Image for post
Image for post
SSIS logo

Even after the rise of Big Data technologies, Microsoft SQL Server Integration Services still one of the most popular data integration tools. Mainly, SSIS developers use Visual Studio to develop their data integration packages. One of the main challenges that face the SSIS developers is that they design tens of hundreds of similar packages, where they need to recreate the package from scratch each time. Even if in SQL Server 2016 SSIS package parts were introduced to increase the reusability, many scenarios still require a higher level.

This article will mention four approaches that I have tried while working as…


Image for post
Image for post
SchemaMapper logo

As in the first release (1.0.0), SchemaMapper was developed to merge data from different file types (flat file, Excel, Access …) into one SQL table. SchemaMapper 1.1.0 was released after being improved to support reading from relational databases and writing data into more data formats. Also, SchemaMapper 1.1.0 is now available via NuGet package.

SchemaMapper 1.1.0 Release Notes — April 14, 2020

New — Improvements

  • Added the ability to import data from relational databases (MySQL, MSSQL, Oracle, SQLCe, SQLite)
  • Added the ability to export data into Flat files, XML, and relational tables (Oracle, MySQL)
  • Source code optimization (OOP, added exporters namespace)
  • Support of Boolean data type in source and destination tables

Fixed Bugs


Image for post
Image for post

A few years ago, I was hearing from my colleagues, “don’t ever think about installing Hadoop on Windows operating system!”. I was not convinced of this saying because I am a big fan of Microsoft products, especially Windows.

In the past few years, I worked on several projects where we were asked to build a Big Data ecosystem using Hadoop and related technologies on Ubuntu. It was not so easy to work with these technologies, especially since there is a lack of online resources. Last month, I was asked to build a Big data ecosystem on Windows. …


This article is a part of a series that we are publishing on TowardsDataScience.com that aims to illustrate how to install Big Data technologies on Windows operating system.

Previously published:

In this article, we will provide a step-by-step guide to install Apache Pig 0.17.0 on Windows 10.

1. Prerequisites

1.1. Hadoop Cluster Installation

Apache Pig is a platform build on the top of Hadoop. You can refer to our previously published article to install a Hadoop single node cluster on Windows 10.

Note that the Apache Pig latest version 0.17.0…


A step-by-step guide to install Apache Hive 3.1.2 on Windows 10 operating system

While working on a project, we were asked to install Apache Hive on a Windows 10 operating system. Many guides were found online but unfortunately, they didn’t work. For this reason, I decided to write a step-by-step guide to help others.

The starting point of this guide was from a great video I found on Youtube which provides a working scenario for Hive 2.x without much detail.

This article is a part of a series that we are publishing on TowardsDataScience.com that aims to illustrate how to install Big Data technologies on Windows operating system.

Other published articles in this…


While working on a project two years ago, I wrote a step-by-step guide to install Hadoop 3.1.0 on Ubuntu 16.04 operating system. Since we are currently working on a new project where we need to install a Hadoop cluster on Windows 10, I decided to write a guide for this process.

This article is a part of a series that we are publishing on TowardsDataScience.com that aims to illustrate how to install Big Data technologies on Windows operating system.

Other published articles in this series:

1. Prerequisites

First…


Image for post
Image for post
SchemaMapper logo

SchemaMapper is a C# data integration class library that facilitates data import process from external sources having different schema definitions.

It can:

  • Import tabular data from different data sources (.xls, .xlsx, .csv, .txt, .mdb, .accdb, .htm, .json, .xml, .ppt, .pptx, .doc, .docx) into a SQL table with a user defined table schema after mapping columns between source and destination.
  • Replace creating many integration services packages by writing few lines of codes
  • Allow users to add new computed and fixed valued columns.

Used technologies

SchemaMapper is developed with .NET framework 4.5


Image for post
Image for post
ad-hoc network

Ad-hoc networks Quality of Service parameters

  1. Bandwidth: the rate at which an application’s traffic must be carried by the network.
  2. Latency: the delay that an application can tolerate in delivering a packet of data.
  3. Jitter: variation in Latency.
  4. Loss: the percentage of lost data.
  5. Security
  6. Availability: the ability of a device to send or receive packet from other devices.
  7. Battery life
  8. Mobility

Security techniques used in ad hoc networks

  1. Public Key algorithm: Every device generates a pair of keys an use them for encryption and decryption.
  2. Authentication using certificates: each device must have a certificate from a trusted root.
  3. Hash functions: to check signature and data integrity.

These security methods increases the network…

Hadi Fadlallah

Data Engineer, Ph.D. Candidate in Data Science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store