CS Foundation Statistics Notes

Collection And Presentation Of Statistical Data – CS Foundation Statistics Notes

December 29, 2022

Collection And Presentation Of Statistical Data – CS Foundation Statistics Notes

The collection of data is a major step in statistics. Before collecting data analyst has to decide for what objective the data is required, statistical units to be used, the degree of accuracy of data required. Data collection is an important aspect of any type of research study. Inaccurate data collection can impact the results of a study and ultimately lead to invalid results. There are 2 types of data

Primary data
Secondary data

1. Primary data
The data which is collected for the first time from the source is Primary data. Primary data is the data collected by the investigator himself by some of the below-mentioned methods

Interview
Observation
action research
case studies
life histories
questionnaires etc.

Some of the major sources of primary data in India are the Central Statistical organization, Census of India, National sample survey, and Reserve Bank of India.

2. Secondary data
Secondary data is any information collected by someone else other than its user. It is data that has already been collected and is readily available for use. It is used to gain initial insight into the research problem. It is classified in terms of its source – either internal or external. These are available from

Already published data eg publications of government department and agencies
Previous research
Official statistics
Mass media products
Diaries :
Letters
Government reports
Web information
Historical data and information
Publications of international bodies like UNO, WTO

3. Precautions before using secondary data
The investigator should consider precautions before using the secondary data. In this connection, the following precautions should be taken into account:
1. Suitable Purpose of Investigation: The investigator must ensure that the data are suitable for the purpose of the inquiry.
2. Inadequate Data: Adequacy of the data is to be judged in the light of the requirements of the survey as well as the geographical area covered by the available data.
3. Definition of Units: The investigator must ensure that the definitions of units that are used by him are the same as in the earlier investigation.
4. Degree of Accuracy: The investigator should keep in mind the degree of accuracy maintained by each investigator.
5. Time and Condition of Collection of Facts: It should be ascertained before making use of available data as to which period the data belong and conditions, under which the data is collected.
6. Comparison: Investigators should keep in mind whether the secondary data is reasonable, consistent, and comparable.
7. Test Checking: The use of the secondary data must do test checking and see that totals and rates have been correctly calculated.
8. Homogeneous Conditions: It is not safe to take published statistics at their face value without knowing their means, values, and limitations.
Weather data is collected from primary sources or secondary sources the standard of data depends on the financial availability, accuracy expected, etc. These days generally secondary data is used as reliable government publications are available.

4. Primary data collection methods
(a) Personal interview
Advantage -permits detailed & in-depth questions & responses,

minimizes non-response, answers are reliable Disadvantage -costly
interviewer bias -investigator bias
the interviewer may not give true and correct answers

(b) Telephone Interview
Advantage -convenient

saves time
relatively inexpensive

Disadvantage -limited length & depth of questions and responses
(c) Self-administered Questionnaire
Advantage -cost-effective for large areas

minimizes interviewer bias -promotes accurate answers Disadvantage -low response rates
unanswered questions -incorrect answers

(d) Information received from local agencies
Advantages – good when information required on a continuous basis

Cost-effective

Disadvantages – not useful for comprehensive and extensive study

Agencies from which data is collected could be biased

5. Census and Sample
A census measures absolutely everyone in the whole country. A representative sample measures a small number of people who fit a particular category of people eg 5000 females are there out of which only the one who is working is to be surveyed regarding smoking habits. Thus this example depicts a sample measuring a small number of people who fit a particular category.

6. Pros of census

provides a true measure of the population (no sampling error)
benchmark data may be obtained for future studies
detailed information about small sub-groups within the population is more likely to be available
can be used for various research purposes
reliable data

7. Cons of census

maybe difficult to enumerate all units of the population within the available time
higher costs, both in staff and monetary terms, than for a sample
generally takes longer to collect, process, and release data than from a sample

8. Pros of sample

costs would generally be lower than for a census
results may be available in less time
detailed information can always be done cause it is less costly and less time consuming
if good sampling techniques are used, the results can be very representative of the actual population
greater scope of flexibility

9. Cons of sample

data may not be representative of the total population, particularly where the sample size is small.
often not suitable for producing benchmark data
as data are collected from a subset of units and inferences made about the whole population, the data are subject to ‘sampling’ error
decreased number of units will reduce the detailed information available about sub-groups within a population

10. Presentation of data.
Once the data has been assembled, a systematic procedure is to be followed to make the data presentable so that the purpose for which data was collected is achieved. The procedure to be followed is

1. Classification of data
A data is classified to give it a meaningful shape. Data classification highlights the salient features of the data.
Once the data has been collected it is to be classified into various groups on the basis of common factors present in them. Like a data can be classified on the basis of literacy – literate or illiterate, on the basis of working – working or non-working, on the basis of income – wages below Rs. 5000 and wages between Rs. 5000 to Rs. 7000. Thus we can classify data oil on various grounds. There are 4 bases of classification of data

(a) Chronological – In this type of classification the data is classified on the basis of time. Thus we can say that in the year 2001 GDP was ‘A’ in the year 2002 GDP is ‘B’ Classification is done between related data and in chronological manner data for 2001 than data for 2002 and so on.

(b) Quantitative – Quantitative data is data that can be measured numerically. Things that can be measured precisely rather than through interpretation such as the number of attendees at an event, the temperature in a given location, or a person’s height in inches can be considered quantitative data.

(c) Qualitative – Qualitative data are forms of information gathered in a non-numeric form. In qualitative classification, data are classified on the basis of some attributes or quality such as sex, literacy, religion, etc. In this type of classification, the attribute can not be measured rather it is classified on the basis of whether the attribute is present or absent.

(d) Geographical classification – This type of classification is based on the basis of data available for various regions. Data is classified on the basis of geographical divisions.

2. Tabulation of data
It is the process of condensation of the data for convenience, in statistical processing, presentation, and interpretation of the information. As per Secrist ‘Tables are means of recording in permanent form the analysis that is made through classification and by placing in just opposition things that are similar and should be compared’.
Significance of tabulation of data
Tabulation of data is helpful in many ways. Some of the benefits drawn from the tabulation of data are

Data gets categorized in such a manner that it can be considered homogenous for further analysis.
A categorized data is useful when made comparisons
It discloses trends and patterns of data
A tabularized data can be considered a sorted data which helps in easy identification
Helpful in statistical analysis Essential Parts of A Table

Different parts into which a table should be divided would depend on the nature of the data and the purpose for which they have been collected. However, in general, a statistical table is divided into 7 parts which are given below:

Table Number:
Title of the table:
Captions
Stubs
Body:
Headnote
Footnote Classification of tables

A table can be classified as a data table whenever you need to specify a row or column with header information about that row/column. Broadly it can be classified in the following ways.

1 Simple Table and Complex Tables,
Simple Table
A simple table here means that there is a maximum of one header row and one header column where a header column specifies the type of information in the column. In addition, there are no merged cells within a simple table. Simple tabulation is when the data are tabulated to one characteristic. For example, the class survey conducted on November 11, 2011, determined the frequency or number of students owning different brands of mobile phones like Blackberry, Nokia, iPhone, etc.

Complex Table
Complex tabulation of data that includes more than two characteristics. For example, the frequency or number of girls, boys, and the total class owning the different brands of mobile phones like Blackberry, Nokia, I phone, etc.

Cross Table
Cross tabulations are also a sub-type of complex tabulation that includes cross-classifying factors to build a contingency table of counts or frequencies at each combination of factor levels.

Contingency Table
A contingency table is a display format used to analyze and record the possible relationship between two or more categorical variables. For example, the class survey conducted on November 11, 2011, determined the frequency or number of students owning different brands of mobile phones across boys and girls of ages 17, 18, and 19 (Please Refer to slides no. 9 and 10. in the session 2 presentations). The purpose of this cross-tabulation could be an assumption that boys and girls own certain mobile brands due to a particular age group they represent.

General-purpose table
General-purpose tables are also called reference tables or repository tables, and they provide information for general use and reference. Croxton and Crowden have identified the purpose of such tables in the following words: “Primarily and usually the sole purpose of a reference table is to present data in such a manner that individual items may be found readily by a reader”. Important rules of tabulation.
There are no hard and fast rules for preparing a statistical table. Prof. Bowley has rightly pointed out “In collection and tabulation, common sense is the chief requisite and experience is the chief teacher.” However, the following points should be borne in mind while preparing a table.

A good table must contain all the essential parts, such as Table number, Title, Headnote, Caption, Stub, Body, Footnote, and source note.
A good table should be simple to understand. It should also be compact, complete, and self-explanatory.
A good table should be of the proper size. There should be proper space for rows and columns. One table should not be overloaded with details. Sometimes it is difficult to present entire data in a single table. In that case, data are to be divided into more tables.
A good table must be prepared in *a clear manner for its purpose so that a scholar can understand the problem without any strain.
The rows and columns of a table must be numbered.
In all tables, the captions and stubs should be arranged in some systematic manner. The manner of presentation maybe alphabetically, or chronologically depending upon the requirement.
The unit of measurement should be mentioned in the headnote.
The figures should be rounded off to the nearest hundred, or thousand or lakh. It helps in avoiding unnecessary details.
In case of non-availability of information, one should write N.A. or indicate it by dash (-).

3. Frequency distribution of data
The frequency (f) of a particular observation is the number of times the observation occurs in the data. The distribution of a variable is the pattern of frequencies of the observation. A frequency distribution is a tool for organizing data. We use it to group data into categories and show the number of observations in each category. Frequency distributions are portrayed as frequency tables, histograms, or polygons.
Guidelines for constructing a frequency distribution.

Each value should fit into a category. The classes should be mutually exhaustive.
No value should fit into more than 1 category. The classes should be mutually exclusive, there should be no overlapping of classes.
Make the classes of equal size if possible. This makes it easier to compare the frequency in one class to another.
Avoid open-ended classes if possible such as “75 and over”.
Try to use between 5 and 20 classes if possible. If you have fewer than 5 classes, you’re not really breaking up the data, and if you use more than 20 classes, this will probably be information overflow.
It is usually convenient to use class sizes of 5 or 10, in other words, to have each class containing 5 or 10 possible values.
It is usually convenient to make the lower limit of the first category a multiple of the class size.

After the first two rules above, the rest are merely suggestions.
Example 1 — Constructing a frequency distribution table
A survey was taken in an area. In each of 20 homes, people were asked how many cars were registered to their households. The results were recorded as follows:
. 1,2, 1,0, 3,4, 0, 1, 1, 1,2,2, 3,2, 3,2, 1,4, 0,0
Frequency distribution table.
Table 1. Frequency table for the number of cars registered in each household

Number of cars (x)	Tally	Frequency (f)
0	IIII	4
1	IIIIII	6
2	IIIII	5
3	III	3
4	II	2

Let us understand the construction of the frequency distribution table
Use the following steps to present this data in a frequency distribution table.

Divide the results (x) into intervals and then count the number of results in each interval.
In this case, the intervals would be the number of households with no car (0), one car (1), two cars (2), and so forth.
Make a table with separate columns for the interval numbers (the number of cars per household), the tallied results, and the frequency of results in each interval.
Label these columns Number of cars, Tally, and Frequency.
Read the list of data from left to right and place a tally mark in the appropriate row.
For example, the first result is a 1, so place a tally mark in the row beside where 1 appears in the interval column (Number of cars). The next result is a 2, so place a tally mark in the row beside the 2, and so on. When you reach your fifth tally mark, draw a tally line through the preceding four marks to make your final frequency calculations easier to read.
Add up the number of tally marks in each row and record them in the final column entitled Frequency.

By looking at this frequency distribution table quickly, we can see that out of 20 households surveyed, 4 households had no cars, 6 households had 1 car, etc.

Example 2 — Constructing a cumulative frequency distribution table
A cumulative frequency distribution table is a more detailed table. It looks almost the same as a frequency distribution table but it has added columns that give the cumulative frequency and the cumulative percentage of the results, as well.
At a recent chess tournament, all 10 of the participants had to fill out a form that gave their names, address, and age. The ages of the participants were recorded as follows:
36, 48, 54, 92, 57, 63, 66, 76, 66, 80
Cumulative frequency table

Table 2. Ages of participants at a chess tournament
Lower Value	Upper Value	Frequency(f)	CumulativeFrequency	Percentage	Cumulative percentage
35	44	1	1	10.0	10.0
45	54	2	3	20.0	30.0
55	64	2	5	20.0	50.0
65	74	2	7	20.0	70.0
75	84	2	9	20.0	90.0
85	94	1	10	10.0	100.0

Let us understand the construction of this table
Use the following steps to present these data in a cumulative frequency distribution table.

• Divide the results into intervals, and then count the number of results in each interval. In this case, intervals of 10 are appropriate. Since 36 is the lowest age and 92 is the highest age, start the intervals at 35 to 44 and end the intervals with 85 to 94.

• Create a table similar to the frequency distribution table but with three extra columns.

• In the first column or the Lower value column, list the lower value of the result intervals. For example, in the first row, you would put the number 35.

• The next column is the Upper-value column. Place the upper value of the result intervals. For example, you would put the number 44 in the first row.

• The third column is the Frequency column. Record the number of times a result appears between the lower and upper values. In the first row, place the number 1.

• The fourth column is the Cumulative frequency column. Here we add the cumulative frequency of the previous row to the frequency of the current row. Since this is the first row, the cumulative frequency is the same as the frequency. However, in the second row, the frequency for the 35—44 interval (i.e., 1) is added to the frequency for the 45-54 interval (i.e., 2). Thus, the cumulative frequency is 3, meaning we have 3 participants in the 34 to 54 age group. 1+2=3

• The next column is the Percentage column. In this column, list the percentage of the frequency. To do this, divide the frequency by the total number of results and multiply by 100. In this case, the frequency of the first row is 1 and the total number of results is 10. The percentage would then be 10.0. 10.0. (1-10) x 100= 10.0

• The final column is Cumulative percentage. In this column, divide the cumulative frequency by the total number of results and then to make a percentage, multiply by 100. Note that the last number in this column should always equal 100.0. In this example, the cumulative frequency is 1 and the total number of results is 10, therefore the cumulative percentage of the first row is 10.0.
10.0. (1 – 10) x 100 = 10.0

• Class Limit – Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between the upper limit of one class and the lower limit of the next.

• Class intervals – While arranging a large amount of data (in statistics), they are grouped into different classes to get an idea of the distribution, and the range of such class of data is called the Class Interval.

• Class Midpoint -it is the midpoint of the class interval i.e. lower limit + upper limit/2 is the class midpoint.

• Frequency of a class interval – is the number of observations that occur in a particular predefined interval. So, for example, if 20 people aged 5 to 9 appear in our study’s data, the frequency for the 5-9 interval is 20. Classification is of two types according to the class intervals –

Exclusive Method
Inclusive Method.

(i) Exclusive Method: In this method, the upper limit of a class becomes the lower limit of the next class. It is called ‘ Exclusive ‘ as we do not put any item that is equal to the upper limit of a class in the same class; we put it in the next class, i.e. the upper limits of classes are excluded from them. For example, a person of age 20 years will not be included in the class-interval (10 – 20) but taken in the next class (20 – 30), since in the class interval (10 – 20) only units ranging from 10 – 19 are included.

(ii) Inclusive Method: In this method, the upper limit of any class interval is kept in the same class interval. In this method, the upper limit of a previous class is less by 1 from the lower limit of the next class interval. In short, this method allows a class interval to include both its lower and upper limits within it.

• The endpoints of a class interval – are the lowest and highest values that a variable can take. Example So, if the intervals are 0 to 4 years, 5 to 9 years, 10 to 14 years, 15 to 19 years, 20 to 24 years, and 25 years and over. The endpoints of the first interval are 0 and 4 if the variable is discrete, and 0 and 4.999 if the variable is continuous. The endpoints of the other class intervals would be determined in the same way.

• Class interval width – is the difference between the lower endpoint of an interval and the lower endpoint of the next interval. For example, if continuous intervals are 0 to 4, 5 to 9, etc., the width of the first five intervals is 5, and the last interval is open since no higher endpoint is assigned to it. The intervals could also be written as 0 to less than 5, 5 to less than 10, 10 to less than 15, 15 to less than 20, 20 to less than 25, and 25 and over.

• Class boundaries – Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper-class boundary is found by adding 0.5 units to the upper-class limit.

4. Diagrammatic presentations
Although tabulation is a very good technique to present the data, diagrams are an advanced technique to represent data. As a layman, one cannot understand the tabulated data easily but with only a single glance at the diagram, one gets a complete picture of the data presented. According to M.J. Moroney, “diagrams register a meaningful impression almost before we think.

The following are a few advantages of the diagrammatic presentation of data.

Simple and Easy to understand: The data presented in the form of diagrams is the simplest and the easiest to understand. The entire data can be easily understood even by having a single glance at the diagram.
Attractive and Impressive: Diagrammatic presentation makes the data more attractive and interesting. Diagrams tend to leave along a lasting impact on the mind.
Helpful in Making Comparisons: The presentation of data in the form of diagrams helps in making comparisons between two or more groups or two or more periods.

Limitations of Diagrammatic Presentation

Diagrams do not present the small differences properly.
These can easily be misused.
Only artists can draw multi-dimensional diagrams.
In statistical analysis, diagrams are of no use. ,
Diagrams are just supplemented to tabulation.
Only a limited set of data can be presented in the form of a diagram.
Diagrammatic presentation of data is a more time-consuming process.
Diagrams present preliminary conclusions.
Diagrammatic presentation of data shows only an estimate of the actual behavior of the variables.

Guidelines for Diagrammatic presentation

The diagram should be properly drawn at the outset.
The pith and substance of the subject matter must be made clear under a broad heading that properly conveys the purpose of a diagram.
The size of the scale should neither be too big nor too small.
If it is too big, it may look ugly. If it is too small, it may not convey the meaning.
In each diagram, the size of the paper must be taken note of. It will help to determine the size of the diagram.
For clarifying certain ambiguities some notes should be added at the foot of the diagram. This shall provide the visual insight of the diagram.
Diagrams should be absolutely neat and clean. There should be no vagueness or overwriting on the diagram.
Simplicity refers to love at first sight. It means that the diagram should convey the meaning clearly and easily.
The scale must be presented along with the diagram.

Types of Diagrams

A. Line Diagrams
In these diagrams, only a line is drawn to represent one variable. These lines may be vertical or horizontal. The line graphs are usually drawn to represent the time series data related to the temperature, rainfall, population growth, birth rates, and death rates.

B. Simple Bar Diagram
It is also called a columnar diagram. Like line diagrams, these figures are also used where only a single dimension i.e. length can present the data. The procedure is almost the same, only one thickness of lines is measured.

C. Multiple Bar Diagrams
The diagram is used, when we have to make a comparison between more than two variables. The number of variables may be 2, 3, or 4 or more. In the case of 2 variables, pair of bars is drawn. Similarly, in the case of 3 variables, we draw triple bars.

D. Sub-divided Bar Diagram
When different components are grouped in one set of variables or different variables of one component are put together, their representation is made by a sub-divided bar diagram. In this method, different variables are shown in a single bar with different rectangles.

E. Pie chart
A pie diagram is a circle of radius neither too larger nor too small whose area is divided into as many different sectors as there are components of the whole data. This is done by drawing straight lines from the center to the circumference of the circle. The area of the circular lamina represents the whole data and it is equivalent to 360 degrees at the center. The area of each sector is proportional to the value of the corresponding components of the data. The area of a sector is proportional to the angle at the centre. A pie diagram is very useful in drawing comparisons among the various components or between a part and the whole.

Construction

Select a suitable radius for the circle to be drawn. A radius of 3, 4, or 5 cm may be chosen for the given data set.
Draw a line from the centre of the circle to the arc as a radius.
Measure the angles from the arc of the circle for each category of vehicles in an ascending order clock-wise, starting with a smaller angle.
Complete the diagram by adding the title, sub-title, and legend. The legend mark be chosen for each
variable/category and highlighted by distinct shades/colors.

5. Graphic Presentation
A graph is a visual representation of data by a continuous curve on a squared (graph) paper. Like diagrams, graphs are also attractive, and eye-catching, giving a bird’s eye view of data and revealing their inner pattern. Graphic presentation enjoys numerous forms of expression ranging from the written word to the most abstract of drawings or statistical graphs.

Graphs are actually two perpendicular lines that intersect each other at a point that is called the origin. The horizontal line is called X-axis and the vertical line is called Y-axis. The four parts of the plane are called quadrants. It may be noted that X and Y are positive in the first quadrant, X is negative and Y is positive in the second quadrant, X and Y both are negative in the third quadrant and X is positive and Y is negative in the fourth quadrant. Graphs are commonly used in the presentation of time series and frequency distribution.

Rules to be followed while making graphs are

It should have a suitable title.
A suitable unit of measurement should be used. A suitable scale is used to present data. Scale selection should be appropriate so that graph is considerably small and visible in one vision.
Various sources of data are to be mentioned at the bottom.

11. Advantages of graphical presentation

It is an efficient method of showing large numbers of observations in a simple manner
A visual impression is more permanent than sets of figures of words. ,
Complex relationships can be demonstrated easily and quickly so that the whole situation is presented simultaneously.
By the use of color and other devices, one can emphasize certain places. For example, an alarming increase in pollution rate might be pictured in red to bring out the aspect of danger involved.
Technical qualification is not required to understand the details presented in graphs as they can be easily understood.
Time-saving for the analyst as data is more understandable
Helps to locate mean, median, mode „
Helps in forecasting, extrapolation, and interpolation of data.

12. Disadvantages of graphical presentations

A graph can be used only to show large or crude variations in the date.
Lack of flexibility in the event a new combination of the data seems appropriate. This follows as a result of the first disadvantage and is one of the reasons why it may be advisable to present the original data in a table or text accompanying the graph.
Shows only a few characteristics of data
Distortion of the situation may result from the desire to oversimplify the material.
Cannot be used in support of some statement
The construction of a graph or chart may be difficult or costly. This should only apply, however, to large-scale drawings which are employed as posters to educate the lay public or similar groups.

13. Types of Graphs
Basically, they can be divided into two categories

Graphs of time series
Graphs of frequency distribution 1. Graphs of time series

Time-series graphs are also known as histograms. In this case, the variable value is dependent on time such as hour, minute, seconds. Such graphs are mostly used by economists, businessmen, and statisticians. Time is represented on X-axis and variable value on Y-axis. Starting value of Y is zero. Various types of graphs are

• Line graph – This graph makes possible the presentation of data with a high degree of accuracy. In fact, careful work and use of the proper coordinate paper make possible the exact reproduction of numerical data, a quality not given to all forms of graphic presentation. Since different types of lines may be used in tracing the data, two or more illustrations may be presented on the same graph. Time is represented on X-axis and variable value on Y-axis. Sometimes the values of data are too large but the variation between the values is too small in such conditions plotting of the graph does not begin from zero instead we make a zigzag horizontal line above the zero for convenience.

• Net balance graph – We use a net balance graph when we have to show data related to income and expenditure or import and exports etc. Here we draw two lines individually like one line showing data related to imports and the other showing data related to export on the same graph. The difference between the two lines (shaded portion) shows the scenario.

Constructing a Time Series Graph
To construct a time-series graph, we must look at both pieces of our paired data set. We start with a standard Cartesian coordinate system. The horizontal axis is used to plot the date or time increments, and the vertical axis is used to plot the values variable that we are measuring. By doing this each point on the graph corresponds to a date and a measured variable. The points on the graph are typically connected by lines in the order in which they occur.

• Histograms and bar charts are both visual displays of frequencies using columns plotted on a graph. The Y-axis (vertical axis) generally represents the frequency count, while the X-axis (horizontal axis) generally represents the variable being measured. A histogram is a type of graph in which each column represents a numeric variable, in particular, that which is continuous and/or grouped. A histogram shows the distribution of all observations in a quantitative dataset. It is useful for describing the shape, centre, and spread to better understand the distribution of the dataset. It is defined as a pictorial representation of a grouped frequency distribution by means of adjacent rectangles, whose areas are proportional to the frequencies. Generally, the data sets are more than 100.

Features of a histogram:

The height of the column shows the frequency for a specific range of values.
Columns are usually of equal width, however, a histogram may show data using unequal ranges (intervals) and therefore have columns of unequal width.
The values represented by each column must be mutually exclusive and exhaustive. Therefore, there are no spaces between columns and each observation can only ever belong in one column.
It is important that there is no ambiguity in the labeling of the intervals on the x-axis for continuous or grouped data.

• Frequency polygon – Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the same purpose as histograms but are especially helpful for comparing sets of data. To create a frequency polygon, start just as for histograms, by choosing a class interval. Then draw an X-axis representing the values of the scores in your data. Mark the middle of each class interval with a tick mark, and label it with the middle value represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a point in the middle of each class interval at the height corresponding to its frequency. Finally, connect the points. You should include one class interval below the lowest value in your data and one above the highest value. The graph will then touch the X-axis on both sides. When several distributions are to be compared on the same graph paper, frequency polygons are better than Histograms.

• Frequency Curve is obtained by joining the points of frequency polygon by a freehand smoothed curve. It is used to remove the ruggedness of polygon and to present it in a good form or shape. This curve is used for a frequency distribution of a continuous distribution when the number of data points becomes very large. In order to plot the points on either the frequency polygon or curve, the mid values of the class intervals of the distribution are calculated. Then the frequencies with respect to the midpoints are plotted. However, in a frequency curve, the points are joined by a smooth curve, whereas in a frequency polygon the points are joined by straight lines. Apart from this major difference, a frequency polygon is a closed figure whereas the frequency curve is not.

• Cumulative frequency curve – Cumulative histograms, also known as ogives, are graphs that can be used to determine how many data values lie above or below a particular value in a data set. The cumulative frequency is calculated from a frequency table, by adding each frequency to the total of the frequencies of all data values before it in the data set. The last value for the cumulative frequency will always be equal to the total number of data values since all frequencies will already have been added to the previous total. The cumulative frequency is plotted on the y-axis against the data which is on the x-axis for un-grouped data. When dealing with grouped data, the Ogive is formed by plotting the cumulative frequency against the upper boundary of the class.

An Ogive is used to study the growth rate of data as it shows the accumulation of frequency and hence its growth rate. An ogive is drawn by plotting the beginning of the first interval at a V -value of zero; Plotting the end of every interval at the V -value equal to the cumulative count for that interval; and connecting the points on the plot with straight lines. In this way, the end of the final interval will always be at the total number of data since we will have added up across all intervals

14. The Survey Technique
In this section, we use several words that are commonly found in surveying. Let us describe and define their meanings before we start.
• A survey is a technique in which a sample of prospective respondents is selected from a population. The sample is then studied with a view to drawing inferences from their responses to the statements in a questionnaire, or the questions in a series of interviews.

• Population is the term we use to describe the main group of people from which a sample is drawn. A population, therefore, may be an organization’s workforce, a management group, or a group of customers.

• A sample is a representative cross-section of people drawn from a population so that their responses may be studied. The sizes of the samples and the structures of the surveys are determined by the kind of data that needs to be collected and from whom.

Collection And Presentation Of Statistical Data MCQ Questions

Question 1.
Some of the sources of published data are
a. ILO
b. IBRD
c. WFCO
d. All of the above
Answer:
d. All of the above

Question 2.
When data is classified on the basis of attribute it is termed as
a. Geographical
b. Qualitative
c. Chronological
d. Both a & c
Answer:
b. Qualitative

Question 3.
In order to save time and money, psychologists collect their data by:
a. Door-to-door survey
b. The use of censuses.
c. Using the earlier research papers.
d. The use of samples
Answer:
d. The use of samples

Question 4.
A precise point that separates one class from another is
a. Class boundary
b. Class limit
c. Class interval.
d. Nothing to do with statistics.
Answer:
a. Class boundary

Question 5.
Continuous series of statistical data can be
a. Divided.
b. Can be calculated in fractions
c. Both a & b
d. Cannot be divided
Answer:
c. Both a & b

Question 6.
Collected data should be
a. Quantitative
b. Subjective
c. Objective
d. None of the above
Answer:
a. Quantitative

Question 7.
The total expenditure made by industry under different heads is presented by
a. Histogram
b. Line graph.
c. Bar graph.
d. None of the above.
Answer:
a. Histogram

Question 8.
Whenever we group data into classes it is recommended that we have
a. Less than 5 classes
b. Between 5 and 20 classes
c. At least 2 classes
d. Between 2 and 3 classes
Answer:
b. Between 5 and 20 classes

Question 9.
When a line connects points that are the cumulative percent of observation below the upper limit of each interval in a cumulative frequency distribution is known as
a. Mode
b. Histogram
c. Frequency polygon
d. Ogive
Answer:
d. Ogive

Question 10.
Measures of central tendency are
a. Inferential statistics that identify the best single value for representing a set of data
b. Descriptive statistics that identify the best single value for representing a set of data
c. Inferential statistics that identify the spread of the scores in a data set
d. Descriptive statistics that identify the spread of the scores in a data set.
Answer:
b. Descriptive statistics that identify the best single value for representing a set of data

Question 11.
The mean is
a. The statistical or arithmetic average.
b. The middlemost score.
c. The most frequently occurring scored
d. The best representation for every set of data.
Answer:
a. The statistical or arithmetic average.

Question 12.
Given the following data set, what is the value of the median (2 4361 8925 7)
a. 2
b. 4.7
c. 4.5
d. 10
Answer:
c. 4.5

Question 13.
Which of the following is not a characteristic of the mean?
a. It is affected by extreme scores.
b. It minimizes the sum of squared deviations.
c. The sum of the deviations about the mean is 0
d. It is best used with ordinal data
Answer:
d. It is best used with ordinal data

Question 14.
For making any date useful it is required that it should be
a. Classified
b. Presented properly
c. Appropriate collection
d. All of the above
Answer:
d. All of the above

Question 15.
Sources of Semi-Government publication is
a. RBI
b. WHO
c. WTO
d. None of the above
Answer:
a. RBI

Question 16.
Data collected by research institutions is
a. Primary data
b. Secondary unpublished data
c. Secondary published data
d. All of the above
Answer:
b. Secondary unpublished data

Question 17.
Secondary data should not be
a. Accurate
b. Reliable
c. Suitable
d. None of the above
Answer:
d. None of the above

Question 18.
Given the following set of data, what is the range 12 23 34 54 21 8 9 67
a. 55
b. 59
c. 8
d. 56
Answer:
b. 59

Question 19.
Which of the following statements is true for frequency distribution?
a. The smaller the sample size, the closer the sample mean will be to the population mean, b: The smaller the population size. The smaller the relationship will be between the sample mean and the population means.
c. The larger the population size, the closer the population mean will be to the sample mean.
d. The larger the sample size, the closer the sample mean will be to the population mean.
Answer:
d. The larger the sample size, the closer the sample mean will be to the population mean.

Question 20.
Classification of data is compulsory as
a. It tells the features of data at a glance
b. It helps in meaningful comparison of data
c. Arranges the huge data
d. All of the above
Answer:
d. All of the above

Question 21.
When data is classified on the basis of time it is termed as
a. Quantitative
b. Chronological
c. Qualitative
d. All of the above
Answer:
b. Chronological

Question 22.
Trend and Pattern of data can only be understood if data is in
a. Descriptive manner
b. Tabular form
c. Both a & b
d. None of the above
Answer:
b. Tabular form

Question 23.
A core distribution data is given for an inventory measuring physical fitness. The type of graph that will be used to display the information will be
a. Histogram
b. Line graph
c. Pie chart
d. Bar graph
Answer:
a. Histogram

Question 24.
Whenever we use mean, as a measure of central tendency, the precaution to be taken is
a. Skewed up data is there
b. Random data is there
c. Unorganized data is there
d. None of the above
Answer:
d. None of the above

Question 25.
A population is
a. Same as a sample
b. The selection of a random sample
c. The collection of all items of interest a particular study
d. None of the above
Answer:
c. The collection of all items of interest a particular study

Question 26.
The entities on which data are collected are
a. Variables
b. Data sets
c. Elements
d. None of the above
Answer:
c. Elements

Question 27.
Labels or names used to identify attributes of elements are
a. Quantitative data
b. Qualitative data
c. Simple data
d. None of the above
Answer:
b. Qualitative data

Question 28.
A characteristic of interest for the elements is
a. A variable
b. An element
c. A data set
d. None of the above
Answer:
a. A variable

Question 29.
Tabulation of data makes the presentation of facts and figures
a. More simplified
b. More understandable
c. Both a & b
d. It is lengthy and occupies space
Answer:
c. Both a & b

Question 30.
Stubs are
a. Heading for the vertical column
b. Heading of the horizontal column
c. A title of headnotes
d. None of the above
Answer:
b. Heading of the horizontal column

Question 31.
Some of the primary sources of primary data in India are
a. CSO
b. Census of India
c. Central Bank of India
d. Both a & b
Answer:
d. Both a & b

Question 32.
CSO is
a. Census of India
b. Central Statistical organization
c. Central Survey of India
d. Central statistics of India
Answer:
b. Central Statistical organization

Question 33.
For checking the reliability of data it should be seen that
a. The degree of accuracy required
b. The procedure of collection of data by the primary source
c. Should be adequate
d. Both a & b
Answer:
d. Both a & b

Question 34.
It is Not an Uncommon way of collecting primary data
a. Indirect praline view
b. Telephone survey
c. Mailed questions
d. All of the above
Answer:
d. All of the above

Question 35.
Footnotes are used
a. to point at any specific data which was not expressed in the heading
b. table numbers
c. source note
d. a note at the foot of the document
Answer:
a. to point at any specific data which was not expressed in the heading

Question 36.
In frequency magnitude,
a. zero should be used to indicate the information that is not available
b. the abbreviation should be avoided
c. magnitude of values& the number of times the value has been repeated
d. headnote should be clear
Answer:
c. magnitude of values& the number of times the value has been repeated

Question 37.
For checking how many students scored above 80 % marks in a class what should be the source of data
a. Secondary data
b. Mailed questions
c. Direct investigation
d. Sample investigation
Answer:
c. Direct investigation

Question 38.
Indirect oral investigation method of collection of data is to be used when
a. The source is reluctant to give the information
b. When there is less time
c. When the finances are less
d. All of the above
Answer:
a. The source is reluctant to give the information

Question 39.
For getting periodical information, the best way to seek data is to collect it
a. Indirectly by oral investigation
b. Mailed questionnaire
c. Information received from local agencies
d. All of the above
Answer:
c. Information received from local agencies

Question 40.
Which amongst these is the most expensive way of collecting data?
a. Questionnaire through enumerator
b. Telephonic survey
c. Direct personal interview
d. Indirect oral investigation
Answer:
a. Questionnaire through the enumerator

Question 41.
The types of error in sample investigation way of a collection of data are
a. Actual error
b. Random error
c. Sampling error
d. Both b & c
Answer:
d. Both b & c

Question 42.
Arithmetic operations are appropriate for
a. Qualitative data
b. Quantitative data
c. Both quantitative and qualitative data
d. Neither quantitative nor qualitative data
Answer:
b. Quantitative data

Question 43.
Zipcodes are an example of
a. Qualitative data
b. Quantitative data
c. Neither quantitative nor qualitative data
d. None of the choices are correct
Answer:
a. Qualitative data

Question 44.
A tabular summary of a set of data, which shows
the appearance of data elements in several nonoverlapping classes, is termed
a. The class width
b. a frequency polygon
c. a frequency distribution
d. a histogram
Answer:
c. a frequency distribution

Question 45.
A tabular summary of a set of data showing classes of the data and the fraction of the items belonging to each class is called.
a. the class width
b. a relative frequency distribution
c. a cumulative relative frequency distribution.
d. an ogive
Answer:
b. a relative frequency distribution

Question 46.
Each percentage of data in pie chart should be multiplied by
a. 3.6%
b. 3.7%
c. 3.5%
d. No need of any multiplication
Answer:
a. 3.6%

Question 47.
From the below mentioned what, if missed makes the graphic presentation incomplete
a. Unit of measurement
b. Suitable scale
c. Suitable title
d. All of the above
Answer:
d. All of the above

Question 48.
A graph is a useful tool as
a. Only limited information can be achieved
b. Information provided is not useful for an expert
c. Not precisely correct
d. It does not require that the person who is using graphs should know mathematics
Answer:
d. It does not require that the person who is using graphs should know mathematics

Question 49.
International tourist is randomly selected on a beach in Singapore to ask how many days they spend in Singapore.
a. This method will give reliable results if many tourists are asked.
b. This method will overestimate the time tourists stay in Singapore
c. This method will underestimate the time tourists stay in Singapore
d. This method will work only in sunny weather.
Answer:
b. This method will overestimate the time tourists stay in Singapore

Question 50.
A graphical method of presenting qualitative data by frequency distribution is termed.
a. A frequency polygon
b. An ogive
c. A bar graph
d. None of the above
Answer:
c. A bar graph

Question 51.
The sum of frequencies for all classes will always equal
a. 1
b. the number of elements in a data set
c. the number of classes
d. a value between 0 to 1
Answer:
b. the number of elements in a data set

Question 52.
The information invariably required to be put in a good statistical table is/are
a. Descriptive thinking
b. Table Number
c. Head note
d. Both b & c
Answer:
d. Both b & c

Question 53.
If the upper limit of one class coincides with the lower limit of another class. It is
a. Exclusive class intervals
b. Inclusive class intervals
c. Interval
d. nominal
Answer:
a. Exclusive class intervals

Question 54.
A numerical description of the outcome of an experiment is a random
a. description
b. outcome
C. number
d. variable
Answer:
d. variable

Question 55.
What ¡s distinctive about quantitative content analysis in comparison to quantitative data analysis?
a. They enable easy calculation for those of us who are not too good with figures.
b. It allows for the constant re-assessment of categories and themes
c. It is easy for researchers to use
d. None of the above
Answer:
b. It allows for the constant reassessment of categories and themes

Question 56.
A pie chart is:
a. A chart demonstrating the increasing incidence of obesity in society.
b. Only used in catering management research
c. Any form of pictorial representation of data
d. All illustrations where the data are divided into proportional segments according to the share each has of the total value of the data
Answer:
d. All illustrations where the data are divided into proportional segments according to the share each has of the total value of the data

Question 57.
The general process of gathering, organizing, summarizing, analyzing, and interpreting data is called
a. Statistics
b. Descriptive statistics
c. Qualitative statistics
d. Measurement of statistics
Answer:
a. Statistics

Question 58.
Frequency distribution is
a. Tabular arrangement of data with corresponding frequency
b. Graphical arrangement of data with corresponding frequency
c. Tabular arrangement of data without corresponding frequency
d. Graphical arrangement of data without corresponding frequency
Answer:
a. Tabular arrangement of data with corresponding frequency

Question 59.
Diagrammatic representation of data is
a. Determination of a number of classes
b. Determining the magnitude of classes
c. To show data in geometrical figures
d. None of the above
Answer:
c. To show data in geometrical figures

Question 60.
A complex table represents
a. Only one factor or variable
b. Always two factors or variables
c. Two or more factors or variables
d. All of the above
Answer:
c. two or more factors or variables
Hint
Complex tabulation of data that includes more than two characteristics. For example, frequency or number of girls, boys and the total class owning the different brands of
mobile phones like Blackberry, Nokia, I phone, etc.

Question 61.
One of the following statements of primary data is wrongly stated
a. These constitute first-hand information
b. These are original in nature
c. These are relatively less costly to collect
d. These are more reliable, accurate, and adequate
Answer:
c. These are relatively less costly to collect
Hint
Primary data
The data which is collected for the first time from the source is Primary data. Primary data is the data collected by the investigator himself.

Question 62.
Among the following sources of collecting primary data, one is not correctly placed-
a. Annual report of the Reserve Bank of India
b. Indirect oral investigation
c. Telephonic survey
d. A questionnaire sent through the enumerator
Answer:
a. Annual report of the Reserve Bank of India
Hint
Primary data is the data collected by the investigator himself by some of the below-mentioned methods

Interview
Observation
action research
case studies
life histories
questionnaires etc

Question 63.
The basic demerit of sample investigation is that it
a. is less costly
b. is less time consuming
c. is less reliable because it creates many sources or unanticipated errors
d. Possesses the merit of flexibility
Answer:
c. is less reliable because it creates many sources or unanticipated errors
Hint
Cons of sample

data may not be representative of the total population, particularly where the sample size is small.
often not suitable for producing benchmark data
as data are collected from a subset of units and inferences made about the whole population, the data are subject to ‘sampling’ error hence less reliable.
decreased number of units will reduce the detailed information available about sub-groups within a population.

Question 64.
The expression : $ $\underline{\text { Class Frequency }}$
Width of class$ is Known as:-
a. Frequency density
b. Mid-value of a class interval.
c. Class interval
d. None of the above
Answer:
a. Frequency density
Hint
Frequency density =$ $\underline{\text { Class Frequency }}$
Width of class$

Question 65.
A pie chart is having the shape of-
a. A rectangle
b. A circle
c. A square
d. A bar
Answer:
b. A circle

Question 66.
The qualitative classification includes-
a. Analysis of time series
b. Analysis of date
c. Analysis of series
d. Analysis of attributes
Answer:
d. Analysis of attributes
Hint
Qualitative classification – Qualitative data are forms of information gathered in a non-numeric form. In qualitative classification, data are classified on the basis of some attributes or quality such as sex, literacy, religion, etc. In this type of classification, the attribute can not be measured rather it is classified on the basis of whether the attribute is present or absent.

Question 67.
Find the odd one out:
a. Data collected from Internet
b. Data collected from RBI Annual Report
c. Data collected by an investigator
d. Data collected from the IMF Fact-sheet.
Answer:
c. Data collected by an investigator
Hint
A, b, d are examples of secondary data.

Question 68.
Match the following:

1. Simple Bar Diagram	(I) One single bar is drawn
2. Multiple Bar Chart	(ii) More than one bar is used to represent two or more variables
3. Pie Chart	(iii)Circle Diagram
4. Components Bar Chart	(iv)Sub-divided Bar Chart

The correct option is:
a. 1 (i); 2 (ii); 3 (iii); 4 (iv)
b. 1 (iv); 2 (i); 3 (ii); 4(iii)
c. 1 (ii); 2 (iii); 3 (iv); 4 (i)
d. 1 (iii); 2 (iv); 3 (i); 4(ii)
Answer:
a. 1 (i); 2 (ii); 3 (iii); 4 (iv)

Question 69.
Given are the Country-X’s exports (in Rs/crores) to different regions between April, 20*12 and February 2013:

Region	Europe	Asia	America	Africa
Exports	31,516	42,516	23,495	5,133

Which of the following region has 18° in the Pie Chart –
a. Europe
b. Asia
c. America
d. Africa
Answer:
a. Europe

Question 70.
One of the following is a secondary source of data –
a. Collection of demographic data from, your neighborhood
b. Data collected by an investigator from the shops selling coffee seeds
c. Output data related to the production of wheat from the World Bank Reports
d. Counting the number of persons visiting a shrine on a particular day.
Answer:
d. Counting the number of persons visiting a shrine on a particular day.
Hint
Regions in degree of pie chart will be
Europe = 31516/102660 × 360° = 110.52°
Asia = 42516/102660 × 360 °= 149.09°
America =23495/102660 × 360° = 82.39°
Africa = 5133/102660 × 360°= 18°

Question 71.
An ‘ogive’ can be used to estimate the value of –
a. Mean
b. Mode
c. Quartiles
d. Harmonic mean.
Answer:
c. Quartiles
Hint
A secondary source of data is

Already published data eg publications of government department and agencies
Previous research
Official statistics
Mass media products
Diaries
Letters
Government reports
Web information
Historical data and information
Publications of international bodies like UNO, WTO

Question 72.
Secondary data is collected by-
a. Government
b. Public
c. Online
d. All of the above
Answer:
c. Online
Hint
Cumulative histograms, also known as ogives, are graphs that can be used to determine how many data values lie above or below a particular value in a data set. They are used to determine median, quartiles, percentiles, etc.

Question 73.
Which is the correct sequence
(i) Collection
(ii) Presentation
(iii) Organisation
(iv) Interpretation
(v) Analysis
a. (a) (i), (ii), (iii), (v), (iv)
b. (i), (ii), (iii), (iv), (v)
c. (c) (i), (iii), (ii), (v), (iv)
d. (ii), (i), (iii), (v), (iv)
Answer:
a. (a) (i), (ii), (iii), (v), (iv)
Hint
Secondary data is any information collected by someone else other than its the user. It is data that has already been collected and is readily available for use. it’s used to gain initial insight into the research problem. It is classified in terms of its source – either internal or external.

Question 74.
By finding the mid-value of upper widths of adjacent rectangles of the histogram we can make: –
a. Line Graph
b. Histograph
c. Ogive
d. Pie – chart
Answer:
c. Ogive
Hint
The sequence of data collection is
Collection
Organization
Presentation
Analysis
Interpretation

Question 75.
In which of the following method of data collection more time is required?
a. Census & Sample both
b. Data collection through secondary sources
c. Census investigation
d. Sample investigation
Answer:
b. Data collection through secondary sources
Secondary data is any information collected by someone else other than its the user.

Question 76.
The total angle contained in a pie chart is:
a. 360°
b. 180°
c. 90°
d. 120°
Answer:
a. 360°

Question 77.
If the middle point of a class interval is 60 and the lower limit is 45, then the upper limit would be:
a. 75
b. 55
c. 60
d. 70
Answer:
a. 75

Question 79.
Graph constructed on the basis of Cumulative frequencies arranged in ascending order is:
a. More than Ogive
b. Simple Ogive
c. Less than Ogive
d. Any Ogive
Answer:
c. Less than Ogive
Hint
Cumulative frequency curve – Cumulative histograms, also known as ogives, are graphs that can be used to determine how many data values lie above or below a particular value in a data set. The cumulative frequency is calculated from a frequency table, by adding each frequency to the total of the frequencies of all data values before it in the data set.

Question 80.
Which of the methods of collecting primary data is more expensive?
a. Online Surveys
b. Observation Methods
c. Mailed Questionnaire
d. Telephonic Interview.
Answer:
b. Observation Methods
Hint
Observation is a way of gathering data by watching behavior, events, or noting physical characteristics in their natural setting

CS Foundation Business Economics Notes

Index Numbers And Time Series Analysis – CS Foundation Statistics Notes

December 29, 2022

Index Numbers And Time Series Analysis – CS Foundation Statistics Notes

A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series. This is because sales revenue is well defined, and consistently measured at equally spaced intervals. Data collected irregularly or only once are not time series. Time series are best displayed in a scatter plot. The series value X is plotted on the vertical axis and time t on the horizontal axis.

1. Time series analysis
Time series analysis accounts for the fact that data points taken over time may have an internal structure The purpose of time series analysis is to

to study the past behavior of records
to forecast for future

2. Components of time series
Any time series can contain some or all of the following components:

Trend (T)
Cyclical (C)
Seasonal (S)
Irregular (I)

3. Trend component
The trend is the long-term pattern of a time series. A trend can be positive or negative depending on whether the time series exhibits an increasing long-term pattern or a decreasing long term pattern. If a time series does not show an increasing or decreasing pattern then the series is stationary in the mean.

4. Periodic Variations
(a) Cyclical component
Any pattern showing an up and down movement around a given trend is identified as a cyclical pattern. The duration of a cycle depends on the type of business or industry being analyzed.

(b) Seasonal component
Seasonality occurs when the time series exhibits regular fluctuations during the same month (or months) every year, or during the same quarter every year Regardless of the trend, we can observe that in each year more ice creams are sold in summer and very little in Winter season. The sales in the departmental stores are more during festive seasons than in the normal days. Retail sales peak during the month of December.

5. Irregular component
This component is unpredictable. Every time series has some unpredictable component that makes it a random variable. In prediction, the objective is to model all the components to the point that the only component that remains unexplained is the random component. Irregular fluctuations results due to the occurrence of unforeseen events like floods, earthquakes, wars, famines, etc.

6. Purpose of Time Series
The purpose of time series analysis is to decompose data of time series Methods used for decomposition are
1 Additive models For this method
Data = Seasonal effect + Trend + Cyclical + Residual For monthly data, an additive model assumes that the difference between the January and July values is approximately the same each year. In other words, the amplitude of the seasonal effect is the same each year.

2 Multiplicative models
In many time series involving quantities (e.g. money, wheat production, …), the absolute differences in the values are of less interest and importance than the percentage changes Data = Seasonal effect x Trend * Cyclical x Residual

7. Measurement of Trend
To measure the trend, the short-term variations should be removed and irregularities should be smoothed out. The following are the methods of measuring trends.

Graphic (or freehand curve) method
Semi-average method
Moving average method
Least squares method

1. Freehand method or graphic method
It is the simplest method to determine a trend. Simply, the freehand method is to create a trend line in accordance with what we see. First of all the data is plotted on a graph paper and the trend line is fitted by a line or a freehand curve by just inspecting and following the graph of the series. The curve needs to be smooth and with an almost equal number of points above and below it. By eye estimate, the sum of the vertical deviations of the given points above the trend line should approximately equal the sum of the vertical deviations of the given points below the trend line. Also, the sum of the squares of the vertical deviations of the given points from the trend line should be the minimum possible.

2. Semi average
This method is also simple and relatively objective than the freehand method. The data is divided into two equal halves and the arithmetic mean of the two sets of values of Y is plotted against the center of the relative time span. If the numbers of observations are even the division into halves will be straightforward, however, if the number of observations is odd, then the middlemost i.e.,
$\left(\frac{n+1}{2}\right)$th , item is dropped. The two points so obtained are joined through a straight line which shows the trend.

3 Moving Average Method: This method comprises of taking arithmetic means of the data/values for a certain span and then placing the value so calculated against the middle of the time span. The time span should be equal to the average fluctuation period. If this span is of period k, then the moving averages obtained by averaging k at a time are called Moving Averages of period or extent k. If k is even, the successive values of moving averages are placed in the center/middle of the period/span of time.

4. Least Squares Method Polynomials are one of the most commonly used types of curves in regression. The least-squares method is used to find the best linear relationship between two variables.

The least-squares line uses a straight-line method to approximate the given set of data y = a + bx The unknown coefficients ^aand^b can therefore be obtained:

Here y=production/sales etc
X=time
Please note that ^aand^b are unknown coefficients while all ^xiand^yi are given.
B Least square parabola
The least-squares parabola method uses a second degree curve y = a + bx + ax² to approximate the given set of data, (x₁, y₁), (x₂, y₂), ….,(x_n y_n), where n ≥ 3 The least-squares parabola uses a second degree curve y- a + bx + cx₂

8. Forecasting
Forecasting is a strategy used in different fields to predict the future based upon the past. This strategy is pulled from different data sources to provide a financial expert or entrepreneur with the needed information to run a business or invest more effectively and successfully. It is used by companies to determine how to allocate their budgets for an upcoming period of time.

Survey – Surveys provide a means of measuring characteristics, self-reported and observed behavior, attitudes or opinions, etc. of society. These methods are often used for forecasting sales or demand of a product Various methods of surveys are

– Complete enumeration method – in this method data is collected from all the individuals. This method is costly particularly when the survey population is large. A complete enumeration-based survey is often preferred for certain types of data, solely because it is expected that it will provide complete statistical coverage over space and time. Complete enumeration may be required as a statutory obligation, often for regulatory purposes.

– Sample survey method – this type of survey method is used when the population is large, a well-designed sample-based survey can often provide good estimates of important parameters at a fraction of the cost. Sample surveys operate on selected subsets of the target population and, using a number of assumptions regarding the distribution of the population, provide estimates of the parameters under study. Sample-based surveys involve uncertainties as to the correctness of the various assumptions used.

Some of the sample survey methods are

1. Test Marketing – it is done for the launch of a new product or if any existing product is being introduced in a new market. This technique is used during the product development or market introduction phase to determine how people respond to a product. It can be used at many different phases of development to see whether or not the public will buy the product, how the product may need to be adjusted to make it appealing to the public, and how members of the public interact with the product.

2. Expert opinion – Expert Opinion is a relatively informal technique that can be used to serve a variety of purposes, and may be used to assist in problem identification, in clarifying the issues relevant to a particular topic, and in the evaluation of products. In this, a group of experts sit together and form an opinion about the viability and success of the product

3. Delphi Technique – The Delphi technique is a group process used to survey and collect the opinions of experts on a particular subject. It is especially appropriate when it is not possible to convene experts in one meeting In this method the opinion of experts is generally taken by post. Trend projection methods – As in most other analyses, in time series analysis it is assumed that the data consist of a systematic pattern. These methods are cheaper than survey methods cause they take data from past records as a basis of analysis.

4. Time series data – we can fit mathematical trends in data for making forecasts. This analysis is more reliable for short-term forecasts.

5. Smoothing method – trend is calculated by smoothing out the fluctuation due to other components. Two main smoothing methods are a method of moving average and the method of exponential smoothing.

6. Lag technique or lead technique or Barometric technique- the forecast is done on the basis of already occurring events or currently occurring events. It is generally used for predicting business cycles situation.

9. Index numbers
In simple terms, an index (or index number) is a number showing the level of a variable relative to its level in a given base period. Edgeworth defined Index number as ‘ Index number shows by its variations the changes in a magnitude which is not susceptible either of accurate measurement in itself or of direct valuation in practice’
Further, Spiegel defined ‘index number as a statistical measure designed to show changes in variable or a group of related variables with respect of time, geographical locations or another characteristic’
Index numbers
– are time series that focuses on the relative change in a count or measurement over time.
– express the count or measurement as a percentage of the comparable count or measurement in a base period Important characteristic features of index number are

These are expressed in percentages
These are specialized averages
They measure the relative change in the value of a variable or a group of related variables over a period of time

10. Uses of Index numbers

Index number is used for measuring changes in the price level.
Economists frequently use index numbers when making comparisons over time.
Using an index makes quick comparisons easy.
Index numbers are helpful in judging the changes in investment.
They measure the level of business and economic activities and are therefore helpful in finding the economic status of the country.
Throws Light on Economic Condition
Importance For The Government
Analysis of Industry
Comparison of Developed and Under Developed Countries

11. Types of Index Numbers

1. Price Index Numbers – It indicates the relative price of a specific item. It is mostly used by statisticians, policymakers, etc. Usually, the index is assigned a value of 100 in some selected base period, and the values of the index for other periods are intended to indicate the average percentage change in prices compared with the base period. Price index numbers can be divided into Wholesale Price Index numbers and retail price index numbers. The wholesale price index number reflects the general price level in-country while the retail price index number shows the change in the retail price of various commodities.

2. Quantity index numbers – As the name suggests, these indices pertain to measuring changes in volumes of commodities like goods produced or goods consumed, etc. A quantity index is built up from information on quantities such as the number of the total weight of goods or the number of services etc.

3. Value index numbers – These pertain to compare changes in the monetary value of imports, exports, production, or consumption of commodities.

Changes in the price of an item are far easier to measure than changes in quantity (or value). The quantities (or values) of items sold in different outlets may vary enormously, depending on many factors including the size and location of the outlets. Prices, by comparison, will not vary so much from outlet to outlet. It is also much easier to find out the price of an item than the quantity that may have been sold during a month. Knowing price movements in a few outlets will be a good guide to the average movement overall. There will be a tendency for the price of similar items to change in a similar way. On the other hand, with some exceptions, quantities may vary widely and may be difficult to define.

12. Precautions in construction of Index Numbers l

Be clear about the purpose for which the index number is used
While selecting the base period it should be kept in mind that it is normal as well as relatively current.
For construction arithmetic mean and geometric mean is used.
Due importance should be given to different variables.

13. Method of construction of price index numbers

Select one period as the base and separately calculate the movement between that period and each required period by either calculating the difference of the prices of two periods (base period and required period) or by “ calculating the ratio of the two prices of the period (base period and required period)
Calculate the period-to-period movements and chain these

14. Calculation of the index number can be done in the following ways
Average of price relative methods- by taking suitable averages of the price of different item In this method, the price relatives for all commodities is calculated and then their average is taken to calculate the index number. Simple average of price relatives – One of the simplest types of index numbers is a price relative. It is the ratio of the price of a single commodity in a given period or point of time to its price in another period or point of time called the reference period or base period.
Price relative in percentage (of period 1 with respect to 0)
= $\frac{p_{1}}{p_{0}}$ × 100 ………………….. (14.1)

Price index number = Sum price relative in percentage for all item/number of items We denote price relative in percentage or without percentage Example
If the retail price of fine quality rice in the year 1980 was Rs.3.75 and that for the year 1983 was Rs.4.50, then find the price relative. Solution: Here the base period is 1980. The price of rice in the base period was Rs. 3.75. Also in the given period 1983, the price of rice was Rs.4.50. Using the formula (19.1), the required price relative is
P_1980/1983 = $\frac{\text { Rs. } 4.50}{\text { Rs. } 3.75}$ × 100 = 120%

2 Weighted average of price relative method
An average in which each quantity to be averaged is assigned a weight. These weightings determine the relative importance of each quantity on average.
Example
the value of a commodity is as follows
Value: 10 8 5 432 1 0
weights: 2 2 1 10 8 7 68 2
Steps to calculate price index by Weighted average Multiply each value by its weight.
20,16, 5,40,24,14, 68, and 0
2. Add up the products of value times weight to get the total value.
Sum=187
3. Add the weight themselves to get the total weight.
Sum=100
4. Divide the total value by the total weight.
187/100 = 1.87

Aggregate methods – by taking ratios of averages of prices of different items. This is a simple method for constructing index numbers. In this, the total of current year prices for various commodities is divided by the corresponding base year price total and multiplying the result by 100. A simple aggregate index shows the change in the prices, quantities, or values of a group of related items. Each item in the group is treated as having equal weight for purposes of comparing group measurements over time.
Example
If the price of rice is Rs. 4 per kg in the year 1980 and Rs. 6 in the year 1983 and Rs. 8 in the year 1994 thus price relative is
P_1980/1983 = $\frac{\text { Rs. } 6}{\text { Rs. } 4}$ × 100
P_1980/1984 = $\frac{\text { Rs. } 8}{\text { Rs. } 4}$ × 100

In weighted agreegate method the index number is calculated by the ratio of weighted arithmetic mean of current year prices to base price

15. Nature of weights

Laspeyres price index – This index concentrates on measuring price changes from a base year
Paasche’s price index -this uses the end year quantities as weights
Fisher Ideal Index – It is the geometric mean of Laspeyres price index and Paasche’s price index.

16. Tests of the adequacy of Index numbers formulae

1. Unit test- It requires that the formula should be independent of the units in which or for which prices and quantities are quoted. This test is satisfied by all index number methods except the simple (unweighted) aggregative index method.

2. Time reversal test – is a test of determining whether a given method will work both ways in time forwards and backward. In the words of fisher, The test is that the formula for calculating the index number should be such that it will give the same ratio between one point of comparison and the other, no matter which of the two is taken as a base. In other words, when the data for any two years are treated by the same method, but with the bases reversed the two index numbers secured should be reciprocals of each other so that their product is unity

3. Factor Reversal Test- The factor reversal test requires that multiplying a price index and a volume index of the same type should be equal to the proportionate change in the current values. Symbolically, P01 × Q01 = = Value Index. The factor reversal test is satisfied only by the Fisher’s Ideal Index number.

4. Circular Test- It is concerned with the measurement of price changes over a period of years, when it is desirable to shift the base. If P01 represents the price change of the current year on the base year and P12, the price change of the base year on some other base and P20, the price change of the current year on this first base, then the following equation should be satisfied: P01 × P12 × P20 =1.

5. Chain base index number – this type of index number is used when the base period is too far from the current year. In such situations it may happen that the product which is in current year was of no or little importance in base year thus in such conditions a chain base index number is constructed.

Index Numbers And Time Series Analysis MCQ Questions

Question 1.
Which of the following is a measure of dispersion?
a. Mean
b. Percentiles
c. Quartiles
d. Mean absolute deviation
Answer:
d. Mean absolute deviation

Question 2.
Which of the following is Not one of the uses of an index number?
a. To measure the economic well being of a country
b. Basis for comparing related series for administrative purpose
c. To measure the direction of movement of economic variables
d. To deflate series
Answer:
c. To measure the direction of movement of economic variables

3. An index no. ¡s used
a. To measure changes in a variable over time
b. To measure changes in demand
c. To measure changes in price
d. To measure changes in quantity
Answer:
a. To measure changes in a variable over time

Question 4.
The ratio of a new price to the base year price is called the
a. Price increase
b. Price relative
c. Price decrease
d. Price absolute
Answer:
b. Price relative

Question 5.
Which of the following component is not included in a time series?
a. Regular
b. Trend
c. Seasonal
d. vinegar
Answer:
a. Regular

Question 6.
A simple aggregate quantity index Is used to
a, Measure the change in the price of a product
b. Measures the overall change in the quantity of a range of products
c. Measures the overall change in the price of a range of products
d. None of the above
Answer:
c. Measures the overall change in the price of a range of products

Question 7.
The price relative is a price index that is determined by
a. Price ¡n period t/base period price *00
b. Base period price/price in period *100
c Price in period t+base period price *100
d. None of the above
Answer:
a. Price in period t/base period price *00

Question 8.
A simple aggregate price index
a. Compare relative quantities to relative prices
b. Compares absolute prices to absolute quantities
c. Ignores relative quantities
d. Considers reLative quantities
Answer:
c. Ignores relative quantities

Question 9.
Out of the following, which is a reason for computing an index?
a. To project future sales
b. To estimates the trend in a time series
c. To check the base period
d. None of the above
Answer:
d. None of the above

Question 10.
A composite price index based on the prices of a group of items is known as the
a. Laspeyres index
b. Paasche index
c. Aggregate price index
d. Consumer price index
Answer:
c. Aggregate price index

Question 11.
This index measures the change from month to month in the cost of a representative basket of goods and services of the type bought by a typical household
a. Financial times index
b. Paasche price index
c. Laspeyres price index
d. Retail price index
Answer:
d. Retail price index

Question 12.
Forecasts are referred to as naive if they
a. Are based only on past values of the variable
b. Are short-term forecasts
c. Are long-term forecasts
d. None of the above
Answer:
a. Are based only on past values of the variable

Question 13.
A monthly price index that uses the price changes in consumer goods and services for measuring the change in consumer prices over time is known as the
a. Paasche index
b. Consumer price index
c. Producer price index
d. Laspeyres index
Answer:
b. Consumer price index

Question 14.
Time series analysis is based on the assumption that
a. Random error terms are normally distributed
b. There are dependable correlations between the variable to be forecast and other independent variables
c. Past patterns in the variable to be forecast will continue unchanged into the future
d. The data do not exhibit a trend
Answer:
c. Past patterns in the variable to be forecast will continue unchanged into the future

Question 15.
Though no longer the case, historically, the Dow Jones averages were aggregate price indexes showing the prices of stocks listed
a. On the American stock exchange
b. Over-the-counter
c. On the New York Stock Exchange
d. None of the above
Answer:
c. On the New York Stock Exchange

Question 16.
Which of the following is Not a characteristic of a simple moving average?
a. It smoothes random variations in the data
b. It has minimal data storage requirements
c. It weights each historical value equally
d. It smoothes real variations in the data.
Answer:
b. It has minimal data storage requirement

Question 17.
Which is not a characteristic of exponential smoothing?
a. Smoothes random variations in the data
b. Easily altered weighting scheme
c. Weights each historical value equally
d. None of the above
Answer:
c. Weights each historical value equally

Question 18.
A composite price index where the prices of items in the composite are weighted by their relative importance is known as the
a. Price relative
b. Weighted aggregate price index
c. Consumer price index
d. None of the above
Answer:
b. Weighted aggregate price index

Question 19.
A weighted aggregate price index where the weight for each item is its current period quality is called
a. Aggregate index
b. Consumer price index
c. Index of industrial production
d. Paasche index
Answer:
d. Paasche index

Question 20.
A quantity index that is designed to measure changes in physical volume or production levels of industrial goods over time is known as
a. Index of industrial production and capacity utilization
b. Time index
c. Physical volume index
d. None of the above
Answer:
a. Index of industrial production and capacity utilization

Question 21.
Forecasts
a. Become more accurate with longer time horizons
b. Are rarely perfect
c. Are more accurate for individual items than for groups of items
d. All of the above
Answer:
b. Are rarely perfect

Question 22.
The three major types of forecasts used by business organizations are
a. Strategic, tactical and operational
b. Economic, technological, and demand
c. Exponential, tactical, seasonal
d. None of the above
Answer:
b. Economic, technological, and demand

Question 23.
Which of the following is not a step in the forecasting process?
a. Determine the use of the forecast
b. Eliminate any assumptions
c. Determine the time horizons
d. All of the above
Answer:
b. Eliminate any assumptions

Question 24.
The tendency of the trend to increase or decrease or stagnate over a long period of time is called
a. Periodic Variation
b. Cyclic Variation
c. Secular Trend
d. Random Variation
Answer:
c. Secular Trend
Hint
The tendency of the trend to increase or decrease or stagnate over a long period of time is called a secular trend. It is a market trend with some characteristic phenomenon that is not cyclical but exists over a long period.

Question 25.
The equation Y= a+bx is used to get the value of
a. Parabolic Trend
b. Exponential Trend
c. Linear Trend
d. None of the above
Answer:
c. Linear Trend
Hint
Linear trends show steady, straight-line increases or decreases where the trend-line can go up or down. Equation Y = a+bx is used to get the value of the linear trend.

Question 26.
The price index that uses current year quantities as weights is known as
a. Fisher ideal index
b. Paasche price index
c. Lasparey’s price index
d. Raman price index
Answer:
b. Paasche price index
Hint
Paasche’s price index -this uses the end-year quantities as weights.

Question 27.
The test that requires that the product of Price Index & the corresponding quantity index number should be equal to the value index number is known as
a. Unit Test
b. Time Reversal Test
c. Factor Reversal Test
d. Circular Test
Answer:
c. Factor Reversal Test
Hint
Factor Reversal Test- The factor reversal test requires that multiplying a price index and a volume index of the same type should be equal to the proportionate change in the current values.

Question 28.
The total sum of values of a given year divided by the sum of the values of the base year is
a. Price index
b. Quantity index
c. Value index
d. None of the above
Answer:
c. Value index

Question 29.
The trend equation for the annual sale of a product is Y= 120+36x with the Year 1990 as the origin. The annual sales for the year 1992 will be
a. 156
b. 192
C. 120
d. None of the above
Answer:
b. 192
Hint
Origin-1990
Annual sales of year 1992
Thus x =2
Y = 120 + 36 x
Y = 120+ 36 x 2 =192

Question 30.
The technique of estimating the probable value of phenomenon at a future date is called:
a. Interpolation
b. Extrapolation
c. Forecasting
d. Probability.
Answer:
c. Forecasting
Hint
Forecasting is the process of making predictions of the future based on past and present data and analysis of trends.

Question 31.
This test of the adequacy of index number requires that the formulae for calculating an index number should give consistent results in both directions. This test is known as:
a. Time reversal test
b. Factor reversal test
c. Circular test
d. Unit test.
Answer:
a. Time reversal test
Hint
Time reversal test – is a test of determining whether a given method will work both ways in time forwards and backward. In the words of fisher, The test is that the formula for calculating the index number should. be such that it will give the same ratio between one point of comparison and the other, no matter which of the two is taken as a base.In other words, when the data for any two years are treated by the same method, but with the bases reversed the two index numbers secured should be reciprocals of each other so that their product is unity

Question 32.
An index number is a
a. The measure of dispersion
b. The measure of correlation
c. The measure of regression
d. A special type of average is expressed in percentage or rate over a period of time.
Answer:
d. Special type of average expressed in percentage or rate over a period of time.
Hint
Important characteristic features of index number are These are expressed in percentages

These are specialized averages
They measure the relative change in the value of a variable or a group of related variables over a period of time

Question 33.
Test of adequacy requires that the formulae for calculating an index number should give consistent results in both the directions. This test is satisfied by
a. Fisher Ideal Index
b. Kellys Index
c. Bowley Index
d. Walche Index.
Answer:
a. Fisher Ideal Index
Hint
Time reversal test – is a test of determining whether a given method will work both ways in time forwards and backward. In the words of fisher, The test is that the formula for calculating the index number should, be such that it will give the same ratio between one point of comparison and the other, no matter which of the two is taken as a base.In other words, when the data for any two years are treated by the same method, but with the bases reversed the two index numbers secured should be reciprocals of each other so that their product is unity

Question 34.
Which of the following is/are weighted aggregative index numbers?
a. Laspeyre
b. Paasche
c. Simple Aggregative Method
d. Both (a) and (b)
Answer:
d. Both (a) and (b)
Hint
Laspeyres price index – This index concentrates on measuring price changes from a base year. It is a weighted aggregative index method. In the weighted aggregate method, the index number is calculated by the ratio of weighted arithmetic mean of current year prices to base price

Question 35.
Which of the following options best suits the Laspeyres and Paasche Index?
a. Both are ideal index numbers.
b. Both are simple aggregative
c. Both are the same
d. Both are weighted aggregate indexes only
Answer:
d. Both are weighted aggregate index only

Question 36.
A Limitation of Index Numbers is:
a. Tough to calculate
b. Requires software for computational purposes
c. They are Mathematical values
d. Index Number is based on a sample that may or may not be representative of the population.
Answer:
d. Index Number is based on a sample that may or may not be representative of the population.
Hint
Limitations of index number – Index Number is based on a sample that may or may not be representative t of the population.

Question 37.
Which of the following is a general form of Exponential trend?
a. y = a + bt
b. Y_t = a × b_t
c. y = a – b
d. Y_t = a + bt + ct₂
Answer:
b. Y_t = a × b_t

Question 38.
Test that requires the product of Price Index Number and Corresponding Quantity Index Number should be equal to the Value Index Number is:
a. Circular Test
b. Time reversal Test
c. Unit Test
d. Factor reversal Test
Answer:
d. Factor reversal Test
Hint
Factor Reversal Test- The factor reversal test requires that multiplying a price index and a volume index of the same type should be equal to the proportionate change in the current values.

Question 39.
Which of the following is forecasting on the basis of past data?
a. Trend projection
b. Index number
c. Both trend and Index number
d. Correlation
Answer:
b. Index number
Important characteristic features of index number are

These are expressed in percentages
These are specialized averages
They measure the relative change in the value of a variable or a group of related variables over a period of time
Thus forecasting on the basis of past data is index number.

Question 40.
A weighted aggregate Price Index, where the weight for each item is its base period quantity, is known as:
a. Laspeyres Index
b. Producer Price Index
c. Consumer price Index
d. Paasche Index
Answer:
a. Laspeyres Index
Hint
Laspeyres price index – This index concentrates on measuring price changes from a base year. It is a weighted aggregative index method. In the weighted aggregate method, the index number is calculated by the ratio of weighted arithmetic mean of current year prices to base price

Tag: CS Foundation Statistics Notes

Collection And Presentation Of Statistical Data – CS Foundation Statistics Notes

CS Foundation Business Economics Notes

Index Numbers And Time Series Analysis – CS Foundation Statistics Notes

CS Foundation Business Economics Notes