DATA AND FILE STRUCTURES

Question 1:

a. What are integrity constraints? Discuss the entity integrity and referential integrity constraints with suitable examples.

Answer: Integrity constraints are rules used to ensure data accuracy and consistency in a database. They are used to enforce certain conditions on the data in a table and ensure that the data is valid and consistent.

  1. Entity integrity constraint: This constraint ensures that each row in a table has a unique primary key and that the primary key is not null. For example, in a table of customers, the customer ID would be the primary key and each customer would have a unique ID.
  2. Referential integrity constraint: This constraint ensures that there is a link between the data in two tables and that the data is consistent. For example, in a table of orders, the customer ID would be a foreign key that references the primary key in the customer table. This ensures that the customer ID in the orders table is a valid ID that exists in the customer table.

b.

Explain the three-level architecture of
DBMS with the help of a diagram.

Answer:

The three-level architecture of a DBMS (Database Management System) consists of the following levels:

  1. External Level: This is the level at which the end-user interacts with the database. It is also known as the user view or the conceptual level. It provides a simplified view of the data, hiding the complexity of the physical storage and organization of the data.
  2. Conceptual Level: This level is also known as the logical level. It provides a global view of the entire database and defines the overall structure of the data. It defines the data in terms of entities, relationships, and constraints. It is independent of the physical storage of data.
  3. Internal Level: This level is also known as the physical level. It deals with the physical storage of data and the access methods used to retrieve the data. It defines how the data is stored on disk, how it is indexed, and how it is accessed.

A diagram for the three-level architecture of DBMS is as follows:

+-------------------+ 
| External Level    | 
| (User View)       | 
+-------------------+
        |
        |
+-------------------+ 
| Conceptual Level  | 
| (Logical Level)   | 
+-------------------+
        |
        |
+-------------------+ 
| Internal Level    | 
| (Physical Level)  | 
+-------------------+

The three-level architecture is used to separate the different levels of abstraction and provide a clear distinction between the end user’s view of the data and the physical storage of the data.This separation makes it easier to manage and maintain the database.

c. Explain the concept 3rd normal form with
the help of an example.

3rd Normal Form (3NF) is a database design principle that stipulates that a table should not contain any redundant data and that all non-key columns should be dependent on the primary key. In other words, a table in 3NF should not contain any columns that are not directly dependent on the primary key.

Here is an example to illustrate the concept:

Consider a table called “Student” with the following columns:

  • student_id (primary key)
  • student_name
  • student_address
  • student_phone
  • student_major
  • department_name

The table is not in 3NF because the department_name column is not dependent on the primary key (student_id). It is redundant data, as department_name can be determined from the student_major column.

To bring the table to 3NF, we would create a separate table called “Department” with the following columns:

  • department_name (primary key)
  • department_address
  • department_phone

And then we would add a foreign key column, department_name, to the Student table. Now the department_name column is dependent on the primary key (department_name) of the Department table.

d. Consider the following three tables
student (student_id, name, date_of_birth)
registers ((student_id, course_id)
course (course_id, course_title, credits)
Write the SQL commands for the following
queries :
(i) List all the courses in alphabetical
order of course title.
(ii) Make a list of students who have
registered for course whose course_id
is “MCS-23”.
(iii) Count the total number of students in
each course.
(iv) List all the courses whose credits are
less than 4.
(v) List all the students who have
registered for more than one course.

10
10
7
7
6
6
8
8
4
4
6
6
Text is not SVG – cannot display

Answer:

  1. Sort all edges in non-decreasing order of their weight.
  2. Create an empty forest and add each vertex to it as an individual tree.
  3. Repeat the following steps until the forest has only one tree: a. Pick the smallest edge from the sorted edges and check if it forms a cycle with the current forest. i. If it does not form a cycle, add it to the forest. ii. If it forms a cycle, discard it.
  4. The forest will now be a minimum spanning tree of the graph, which can be output as the result of the algorithm.

Minimum Cost: 23

Question 2:

a. What is AVL tree? Explain how a node is inserted into an AVL tree and how a node is deleted from an AVL tree.

Answer:

An AVL tree is a self-balancing binary search tree. It was the first such data structure to be invented by Adelson-Velsky and Landis in 1962. In an AVL tree, the height of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Because of this balance property, AVL trees are more efficient than regular binary search trees in terms of time complexity for operations such as insertion, deletion, and search. They guarantee O(log n) time complexity for these operations on a tree with n nodes.

Inserting a new node into an AVL tree involves the following steps:

  1. Perform a regular binary search tree insertion to find the appropriate position for the new node.
  2. Starting from the newly inserted node, traverse up the tree towards the root.
  3. At each node visited, check the balance factor (the difference in height between the left and right subtrees).
  4. If the balance factor is greater than 1 or less than -1, perform a rotation at that node to restore the balance of the tree.
  5. Once the balance is restored, continue traversing up the tree until the root is reached.

Deleting a node from an AVL tree involves the following steps:

  1. Perform a regular binary search tree deletion to find and remove the node to be deleted.
  2. Starting from the parent of the deleted node, traverse up the tree towards the root.
  3. At each node visited, check the balance factor (the difference in height between the left and right subtrees).
  4. If the balance factor is greater than 1 or less than -1, perform a rotation at that node to restore the balance of the tree.
  5. Once the balance is restored, continue traversing up the tree until the root is reached.

It’s important to note that in some cases, the rotation may need to be done multiple times on different nodes to restore the balance of the tree.

Also, in case of deletion, if the node to be deleted is a leaf node or has only one child, then it’s straightforward, but if it has two children, we need to find the in-order successor or predecessor before performing the deletion.

b. Write any five differences between arrays and pointers in ‘C’ Programing Language.

Answer: Following are the difference between arrays and pointers.

  1. Arrays are static in nature, while pointers are dynamic. This means that the size of an array is fixed at the time of its declaration, while the size of a pointer can be changed at runtime.
  2. Arrays are accessed using the array subscript notation (e.g. arr[i]), while pointers are accessed using the dereference operator (*ptr).
  3. Arrays are stored in contiguous memory locations, while pointers can point to any memory location.
  4. Arrays are passed to functions by reference, while pointers are passed by value.
  5. Arrays have their own memory allocated on the stack, while pointers use memory allocated on the heap.
  6. An array’s name by itself, when used in an expression, is a pointer to the first element of the array, but a pointer is a variable that stores the memory address of another variable.
  7. You can use the size of the operator to determine the size of an array, while you need to keep track of the size of a pointer manually.
  8. In C, array elements are stored in contiguous memory locations, so it is easy to access any element of the array using indexing. On the other hand, in the case of pointers, elements may be stored in different memory locations, so it is difficult to access elements using indexing.
  9. Arrays are always multiples of the size of their element, while pointers can be any size.
  10. You can’t change the size of an array after it’s been declared, but you can change the value of a pointer to point to a different memory location.

c. Write linear search algorithm and find its time complexity.

Answer: Linear search is a method for finding an element in an array or a list. The algorithm iterates through the list or array one element at a time, comparing each element to the target element until it is found or the end of the list is reached. The algorithm can be implemented in the following way:

int linear_search(int arr[], int n, int target) 
{
    for (int i = 0; i < n; i++) 
    {
        if (arr[i] == target) 
        {
            return i;
        }
    }
    return -1;
}

The above algorithm takes an array arr, the number of elements in the array n, and the target element target as input. It iterates through the array using a for loop, starting from the first element and ending at the last element. At each iteration, it compares the current element to the target element. If they are equal, it returns the index of the current element as the result. If the end of the array is reached without finding the target element, the function returns -1, indicating that the target element is not present in the array.

The time complexity of this algorithm is O(n) in the worst case, where n is the number of elements in the array. This is because, in the worst-case scenario, the algorithm needs to iterate through the entire array to find the target element. In the best-case scenario, the algorithm will find the target element in the first iteration, resulting in a time complexity of O(1). On average, the time complexity of this algorithm is O(n/2) which is also O(n).

Question 3:

a. What is a Red-Black Tree? Explain its properties.

Answer: A Red-Black Tree (RBT) is a type of self-balancing binary search tree. It is a data structure that guarantees that the height of the tree is always O(log n) where n is the number of elements in the tree. The tree is balanced by maintaining certain properties, which are:

  1. Every node is either red or black.
  2. The root node is always black.
  3. Every leaf node (NULL) is black.
  4. If a node is red, both of its children are black.
  5. For every node, all simple paths from the node to descendant leaves contain the same number of black nodes.

These properties ensure that the tree is roughly balanced, and therefore, all the operations on a Red-Black Tree are guaranteed to have a time complexity of O(log n).

Insertion and deletion in a Red-Black Tree are similar to those in a regular binary search tree, but with additional steps to maintain the balance properties. Insertion may require the rotation of nodes and color changes. Deletion may require the merging of nodes and color changes.

RBTs are used in many applications where a self-balancing binary search tree is needed, such as in the implementation of associative arrays and sets. Also, many programming languages and libraries use Red-Black Tree as the default implementation of their ordered data structures.

b. Explain Direct-File organization.

Answer: Direct file organization is a method of storing and organizing data in a database or file system. Direct file organization stores each record in a fixed location in the file. Direct file organization is also known as the direct access method or random access method. Each record is identified by its position or location in the file, also known as the record’s address.

One of the main advantages of direct file organization is that it allows for fast retrieval of records. Since the location of each record is known, it can be accessed directly without the need to search through the entire file. This makes it suitable for applications that require quick access to specific records, such as in real-time systems or large databases.

Direct file organization is implemented using pointers or indexes that point to the location of each record in the file. The pointer or index contains the record’s unique key and the address of the record. To access a record, the system uses the key to look up the corresponding pointer or index, which then points to the location of the record in the file.

However, direct file organization also has some disadvantages. One of the main disadvantages is that it requires a large amount of storage space to store all the pointers or indexes. Also, if the records are frequently updated, the indexes or pointers will have to be updated as well, which may cause performance issues.

In summary, Direct file organization is a method of storing and organizing data in a database or file system in which each record is stored in a fixed location in the file and is identified by its position or location in the file. This method is useful when fast retrieval of records is required but it requires a large amount of storage space to store all the pointers or indexes and updating the records frequently may cause performance issues.

c. Sort the following list using bubble sort in ascending order :
25, 29, 8, 68, 92, 30, 40
Show intermediate steps of the process.

Answer: Bubble sort is a simple sorting algorithm that repeatedly goes through the list and compares adjacent elements, swapping them if they are in the wrong order. The algorithm repeats this process until the list is sorted.

Here is an example of how to sort the list using bubble sort in ascending order:

Step1: 25, 29, 8, 68, 92, 30, 40

Step2: 25, 8, 29, 68, 30, 40, 92

Step3: 8, 25, 29, 30, 40, 68, 92

Step4: 8, 25, 29, 30, 40, 68, 92

Step5: 8, 25, 29, 30, 40, 68, 92

Step6: 8, 25, 29, 30, 40, 68, 92

As we can see from the above steps, the list is already sorted by the fourth step, thus we don’t have to repeat the process again.

The final sorted list is : 8, 25, 29, 30, 40, 68, 92

As we can see, we have performed six steps to sort the list of 7 element, the time complexity of bubble sort is O(n2) where n is the number of elements in the list.

Question 4:

a. Traverse the following Binary tree in
(i) Pre-order
(ii) Post-order

A
A
F
F
B
B
C
C
E
E
D
D
G
G
H
H
Text is not SVG – cannot display

There are several ways to traverse a binary tree, including:

  1. In-order traversal: Visit the left subtree, then the root, and finally the right subtree.
  2. Pre-order traversal: Visit the root, then the left subtree, and finally the right subtree.
  3. Post-order traversal: Visit the left subtree, then the right subtree, and finally the root.

The specific method used will depend on the requirements of the task at hand.

Pre-order traversal: A,B,C,D,E,F,G,H

In-order traversal: C,D,B,E,A,G,F,H

Post-order traversal: D,C E,B, G, H,F,A

b. Write an algorithm to implement singly linked list using pointers.

Answer:

#include <stdio.h>
#include <stdlib.h>

struct node 
{
    int num;                        //Data of the node
    struct node *nextptr;           //Address of the next node
}*stnode;

void createNodeList(int n); // function to create the list
void displayList();         // function to display the list

int main()
{
    int n;
		printf("\n\n Linked List : To create and display Singly Linked List :\n");
		printf("-------------------------------------------------------------\n");
		
    printf(" Input the number of nodes : ");
    scanf("%d", &n);
    createNodeList(n);
    printf("\n Data entered in the list : \n");
    displayList();
    return 0;
}
void createNodeList(int n)
{
    struct node *fnNode, *tmp;
    int num, i;
    stnode = (struct node *)malloc(sizeof(struct node));

    if(stnode == NULL) //check whether the fnnode is NULL and if so no memory allocation
    {
        printf(" Memory can not be allocated.");
    }
    else
    {
// reads data for the node through keyboard

        printf(" Input data for node 1 : ");
        scanf("%d", &num);
        stnode->num = num;      
        stnode->nextptr = NULL; // links the address field to NULL
        tmp = stnode;
// Creating n nodes and adding to linked list
        for(i=2; i<=n; i++)
        {
            fnNode = (struct node *)malloc(sizeof(struct node));
            if(fnNode == NULL)
            {
                printf(" Memory can not be allocated.");
                break;
            }
            else
            {
                printf(" Input data for node %d : ", i);
                scanf(" %d", &num);
 
                fnNode->num = num;      // links the num field of fnNode with num
                fnNode->nextptr = NULL; // links the address field of fnNode with NULL
 
                tmp->nextptr = fnNode; // links previous node i.e. tmp to the fnNode
                tmp = tmp->nextptr; 
            }
        }
    }
}
void displayList()
{
    struct node *tmp;
    if(stnode == NULL)
    {
        printf(" List is empty.");
    }
    else
    {
        tmp = stnode;
        while(tmp != NULL)
        {
            printf(" Data = %d\n", tmp->num);       // prints the data of current node
            tmp = tmp->nextptr;                     // advances the position of current node
        }
    }
} 

Output:

 Linked List : To create and display Singly Linked List :
-------------------------------------------------------------
 Input the number of nodes : 5
 Input data for node 1 : 25
 Input data for node 2 : 24
 Input data for node 3 : 61
 Input data for node 4 : 27
 Input data for node 5 : 95

 Data entered in the list : 
 Data = 25
 Data = 24
 Data = 61
 Data = 27
 Data = 95

Question 5:

a. Explain the difference between row-major representation of an array and column major representation of an array with the help of a suitable example.

Answer: Row-major representation of an array refers to the way in which the elements of an array are stored in memory. In this representation, the elements of the array are stored in contiguous memory locations, with each row of the array being stored in a contiguous block of memory. For example, consider a 2D array with 3 rows and 4 columns:

1 2 3 4

5 6 7 8

9 10 11 12

In row-major representation, the elements of this array would be stored in memory in the following order: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. This is because the elements of each row are stored in contiguous memory locations.

Column-major representation of an array, on the other hand, refers to the way in which the elements of an array are stored in memory such that each column of the array is stored in a contiguous block of memory. For example, consider the same 2D array with 3 rows and 4 columns:

1 2 3 4

5 6 7 8

9 10 11 12

In column-major representation, the elements of this array would be stored in memory in the following order: 1, 5, 9, 2, 6, 10, 3, 7, 11, 4, 8, 12. This is because the elements of each column are stored in contiguous memory locations.

The main difference between row-major and column-major representation is the order in which the elements of the array are stored in memory. Row-major representation is more common in C++, while column-major representation is more common in Fortran. The choice of representation can affect the performance of certain operations on the array, such as matrix multiplication.

b. Write short notes on the following :
(i) Adjecency matrix representation of graph
(ii) Binary search

Answer:

(i) Adjacency matrix representation of graph:

  • An adjacency matrix is a square matrix used to represent a finite graph.
  • The size of the matrix is VXV where V is the number of vertices in the graph.
  • The value that is stored in the cell at the intersection of ith row and jth column is either 1 or 0.
  • 1 represents an edge between vertex i and vertex j and 0 represents no edge.
  • Adjacency matrix representation is useful for dense graphs where the number of edges is close to V2.
  • It is easy to implement and check if an edge exists between two vertices.
  • It takes O(1) time to check if an edge exists between two vertices.
  • However, it takes O(V2) space to store the matrix and O(V2) time to traverse the graph.

(ii) Binary search:

  • Binary search is an efficient algorithm for finding an element in a sorted array.
  • It works by dividing the array into two halves and repeatedly narrowing down the search to one half by comparing the target element with the middle element of the array.
  • If the target element is smaller than the middle element, the search continues in the left half of the array.
  • If the target element is larger than the middle element, the search continues in the right half of the array.
  • The algorithm stops when the target element is found or when the search space becomes empty.
  • The time complexity of the binary search is O(log n) as the search space is halved with each iteration.
  • It requires the array to be sorted for it to work properly.
  • It is useful for large datasets where the time complexity of linear search would be too high.
Share your love