Skip to main content

Posts

Showing posts from May, 2021

Comma separated values or csv: what is csv

 Introduction: Hi! today we are going to talk about a different and simple topic which I encountered recently in an interview. The topic is csv. As a data scientist, we use pandas daily to read csv files using pd.read_csv() functionality. And that is where the csv turns into a dataframe for us and we start data manipulation with the dataframe. But a csv is not a dataframe, and doesn't need pandas to be read. In this article, we will discuss a few ways to read a csv file, and will also take a deeper dive in how the csv and similar files are actually stored. What is csv? A csv or comma separated value file is a file where each data item is stored with comma separating one value from another. csv is the universally accepted format for tabular data. For a normal csv, each row is stored as a separate new line in the file; and each row contains each values separated by comma, but the row doesn't end with a comma.  How can we read a csv? A csv can be read using pandas ( read_csv

Introduction to MongoDB

Introduction MongoDB is a noSQL database and in this document, we will be working through a datacamp course of introduction to Mongodb and provide a brief introduction to mongodb. This is the first part of the series of post where I summarize the findings from the datacamp course. what is NoSQL? NoSQL stands for not sql; i.e. non-relational databases. Including MongoDB, graphql and other databases, non-relational databases are results of big data and non-relational data models which model the data in a relation less manner. the common ways are graph node-edge systems, key-value pairs(mongodb falls in this) and others. NoSQL are good for scaling up with huge load of data and they store data in non-tabular formats unlike sql and other relational database systems. MongoDB basics: What is MongoDB? Mongodb is one of the free, nosql database systems; which is scalable from its core, and provides a json format data to easily interact, query and plug with your applications. How does a mon

undefined symbol: _py_ZeroStruct

 Introduction: One of the best methods of speeding up your python codes is turning it into a cython code. For turning a python script to cython, you need to save it with a pyx extension and then compile it using a setup file . For example, say you have a file named helloworld.pyx. To compile this, your code inside setup file will look like: from setuptools import setup from Cython.Build import cythonize setup ( ext_modules = cythonize ( "helloworld.pyx" ) ) Now, after doing this, you have to run this setup file using the following command which creates a .c and a .so file. $ python setup.py build_ext --inplace The Problem: So even after compiling, you may end up getting: importerror:... undefined symbol: _py_ZeroStruct when you try and import the compiled cython module.  Why does it happen? and the solution: the source of this problem, according to the stackoverflow is the mismatch between the python version of the compiled cython code and the python f