Sunday, January 31, 2021

File systems versus database management systems (DBMS)

Every DBMS book and beginner course out there discusses this subject, but we still see a considerable number of visitors ending up on our site when searching for “difference between a flat file and a database”, “difference between an excel spreadsheet and a database” etc., so we decided to provide a brief explanation here.  

First of all it is very important to clearly distinguish between a database and a database management system as they are nowadays often used interchangeably (albeit inaccurately). A database in the general sense is any structured collection of data – you have a file where you keep business cards you have collected, that is a database of contacts; you have an excel sheet where you record your ebay sales, that is a database of sales; you have a text file where you have stored a list of prospects, that is database. In a more narrow sense that the term database has evolved into, it really means a structured collection of data that has been stored in a computer system in a particular format through one of the database management systems like Oracle, DB2, SQL Server, MySQL, Access etc. A database management system is a software application that facilitates the process of creating and managing electronic databases. So remember, SQL Server, SQL Azure, Oracle, DB2 etc. are not databases but rather database management systems (dbms)

So, what do we need a database management system for? Isn’t what the file system that the Operating System provides sufficient for our data storage and administration needs? Consider the following oversimplified requirements – your team has been tasked with building a computer system that will collect and administer data on the employees of the company, products and services the company sells, vendors that supply the company with products and services, customers, orders and so on. You quickly identify the following high level requirements:

  • the amount of the data that needs to be stored will reach in the hundreds of gigabytes;
  • thousands of employees will need to access the information concurrently;
  • the system must answer any queries submitted by the users quickly and accurately;
  • data changes made by different users must be applied consistently;
  • access to the data (ability to read, update or add) must be selective – certain users have certain rights on certain parts of the data. 

If as a programmer / software engineer all you have at your disposal is the operating system and a programming language you soon start to realize the immensity of the challenge you have in front of you:

  • with hundreds of gigabytes of data that will be stored on let’s say an array of hard drives you need to build a system that manages the transfer of data from those hard drives to main memory and vice versa as needed. You realize that how you organize the data into files is going to make or break the system. 
  • thousands of users sending thousands of queries to your system means you need to foresee as best as possible the types of queries the system will receive and write code that will handle each of those queries;
  • What if there is a crash in the middle of the day when hundreds of users were adding and updating data? What happens to your data? When a user enters a sales order for example your application stores the order information and amongst the other things sends an order to the shipping department and updates inventory levels but the system crashed and your code did not complete executing! The inventory numbers were not updated… Your system must be able to handle such situations and ensures that your data does not turn into “garbage” no matter what happens – easy to say but very hard to do.  
  • What happens if two or more users are trying to update the same data at the same time? Your application needs to handle this and other concurrency related issues and this on itself is no small task.
  • How about the selective access to the data? Simply not giving the user a way through the interface to access certain data does not provide sufficient security. Separating the data into files based on security needs and using the operating system security mechanisms sounds good but you will soon discover that such separation is not realistic as it would make every other task your system needs to handle much more difficult. 

Can your team handle all those challenges? Of course it can but at what cost? How long will it take your team to build a stable system that can handle those challenging requirements? 

That is where a database management system (DBMS) comes into play. Companies like Oracle, IBM, Microsoft and others have made billions upon billions of dollars selling their DBMS systems (respectively Oracle, DB2, SQL Server) that at the core are software applications designed to handle all those challenges mentioned above and a lot more.  

A DBMS provides an awesome abstraction layer that saves you a tremendous amount of trouble and effort.


Post a Comment