Short Q/A Data and Analysis - Students Free Notes

Express big data in your own words. Explain three Vs of big data with reference to email data.

Big data refers to extremely large and complex datasets that cannot be processed by traditional data processing tools. These datasets often contain valuable insights but require advanced tools and techniques for analysis. The three Vs of big data are: Volume: Refers to the vast amount of data generated, such as thousands of emails received daily. … Read more

Why are summary statistics needed?

Summary statistics are essential as they provide a concise overview of a dataset, making it easier to interpret and analyze large volumes of data. They allow for the identification of key characteristics such as the central tendency (mean, median, mode), variability (range, standard deviation), and the overall distribution of data. Summary statistics simplify decision-making by … Read more

Argue about the trends, outliers, and distribution of values in a dataset. Describe.

Trends in a dataset refer to the general direction in which the data is moving, such as an upward or downward pattern over time. Identifying trends helps in making predictions or understanding long-term behaviors. Outliers are data points that differ significantly from the rest of the data, and their presence may indicate anomalies or errors … Read more

What is meant by sources of data? Give three sources of data excluding those mentioned in the book.

Sources of data refer to the origins or mediums from which data is collected for analysis. These can be direct or indirect sources, structured or unstructured. Three examples of data sources, excluding those in the book, are: Social media platforms: User-generated content on platforms like Twitter or Facebook provides valuable data for sentiment analysis, customer … Read more

Compare machine learning and deep learning in the context of formal and informal education.

Machine learning and deep learning are both subfields of artificial intelligence, but they differ in complexity and application. Machine learning focuses on algorithms that learn from data through statistical techniques and require less computational power compared to deep learning. Deep learning, a subset of machine learning, involves neural networks with many layers, requiring large datasets … Read more

Database is useful in the field of data science. Defend this statement.

Databases are crucial in data science as they provide structured storage for vast amounts of data, which is essential for analysis and modeling. Databases enable easy access, retrieval, and management of data, ensuring consistency and reducing redundancy. Data scientists rely on databases to store cleaned, processed data, and to perform queries to extract relevant information. … Read more

Can you relate how data science is helpful in solving business problems?

Data science is instrumental in solving business problems by leveraging statistical techniques, machine learning, and predictive modeling to extract insights from data. Businesses can make data-driven decisions, predict trends, optimize processes, and personalize customer experiences. For example, a company can use data science to analyze customer purchasing patterns and forecast demand, enabling better inventory management. … Read more

Define data analytics and data science. Are they similar or different? Give a reason.

Data analytics involves examining datasets to draw conclusions about the information they contain. It focuses on interpreting data, identifying patterns, and summarizing findings. Data science, however, is broader and involves using scientific methods, algorithms, and systems to analyze large datasets and extract meaningful insights. While data analytics can be considered a subset of data science, … Read more