Apache Spark

1. Apache Spark Overview

Apache Spark is an open-source, distributed computing system designed for fast computation on large-scale data processing tasks. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

2. Spark Master

The Spark Master is the main node in a Spark cluster that manages the cluster’s resources and schedules tasks. It is responsible for allocating resources to different Spark applications and managing their execution. The Master keeps track of the state of the worker nodes and the applications running on them.

  • Role: Manages resources in the cluster.
  • Components Managed: Workers, applications.

3. Driver

The Driver program runs on a node in the cluster and is the entry point of the Spark application. It contains the application’s main function and is responsible for converting the user’s code into jobs and tasks that are executed by the Spark cluster.

  • Role: Manages the execution of the Spark application.
  • Responsibilities:
    • Converts user code into tasks.
    • Manages the job execution process.
    • Collects and displays output.

4. Cluster Manager

The Cluster Manager is a system that manages the resources of the cluster. Spark can work with several different types of cluster managers, including:

  • Standalone: Spark’s built-in cluster manager. It is simple to set up and is useful for smaller clusters.
  • Apache YARN: Used for larger clusters in environments where Hadoop is deployed.
  • Apache Mesos: A general-purpose cluster manager that can manage multiple types of distributed systems.
  • Kubernetes: Used for running Spark on Kubernetes clusters, allowing containerized Spark applications to be managed at scale.
  • Role: Allocates resources to Spark applications.
  • Responsibilities:
    • Manages the distribution of CPU, memory, and other resources.
    • Works with the Spark Master to schedule tasks.

5. Spark Cluster Architecture

In a Spark cluster, the Driver communicates with the Cluster Manager to request resources (e.g., executors on worker nodes). The Cluster Manager allocates the resources and informs the Master node, which in turn assigns tasks to worker nodes.

  • Driver: Runs the main Spark application.
  • Master: Schedules resources across the cluster.
  • Workers: Execute the tasks assigned by the Master.
  • Cluster Manager: Manages the distribution of resources.

This architecture allows Spark to handle large-scale data processing efficiently by distributing the workload across multiple nodes in the cluster.

Spark Architecture

Apache Spark follows a master/slave architecture with two main daemons and a cluster manager –

  • Master Daemon — (Master/Driver Process)
  • Worker Daemon –(Slave Process)
  • Cluster Manager

When you submit a PySpark job, the code execution is split between the Driver and the Worker Executors. Here’s how it works:

1. Driver

  • The Driver is the main program that controls the entire Spark application.
  • It is responsible for:
    • Converting the user-defined transformations and actions into a logical plan.
    • Breaking the logical plan into stages and tasks.
    • Scheduling tasks to be executed on the Worker nodes.
    • Collecting results from the workers if needed.

Driver Execution:

  • Any code that doesn’t involve transformations on distributed data (e.g., creating RDDs/DataFrames, defining transformations, and actions like collect, show, count) is executed in the Driver.
  • For example, commands like df.show(), df.collect(), or df.write.csv() are initially triggered in the Driver. The Driver then sends tasks to the Worker nodes to perform distributed computations.

2. Worker Executors

  • Executors are the processes running on the Worker nodes. They are responsible for executing the tasks that the Driver schedules.
  • Executors perform the actual data processing: reading data, performing transformations, and writing results.

Worker Execution:

  • All operations that involve transformations (e.g., map, filter, reduceByKey) on distributed datasets (RDDs or DataFrames) are executed on the Worker Executors.
  • The Driver sends tasks to the Executors, which operate on the partitions of the data. Each Executor processes its partition of the data independently.

Example Workflow:

Let’s consider an example to clarify:

pythonCopy codefrom pyspark.sql import SparkSession

# Initialize Spark Session
spark = SparkSession.builder.appName("example").getOrCreate()

# DataFrame creation (executed by the Driver)
data = [("Alice", 34), ("Bob", 45), ("Catherine", 29)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Transformation (lazy, plan is created by the Driver but not executed)
df_filtered = df.filter(df["Age"] > 30)

# Action (Driver sends tasks to Executors to execute the filter and collect the results)
result = df_filtered.collect()  # Executed by Executors

# The results are collected back to the Driver
print(result)
pyspark.sql import SparkSession

# Initialize Spark Session
spark = SparkSession.builder.appName("example").getOrCreate()

# DataFrame creation (executed by the Driver)
data = [("Alice", 34), ("Bob", 45), ("Catherine", 29)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Transformation (lazy, plan is created by the Driver but not executed)
df_filtered = df.filter(df["Age"] > 30)

# Action (Driver sends tasks to Executors to execute the filter and collect the results)
result = df_filtered.collect()  # Executed by Executors

# The results are collected back to the Driver
print(result)

https://sunscrapers.com/blog/building-a-scalable-apache-spark-cluster-beginner-guide/

https://medium.com/@patilmailbox4/install-apache-spark-on-ubuntu-ffa151e12e30

Linux NFS (server,client)

Step 1: Configure the NFS Server

sudo apt update
sudo apt install nfs-kernel-server -y
sudo mkdir -p /nfs
sudo chown nobody:nogroup /nfs
echo "/srv/nfs/share    *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
sudo exportfs -arv

Step 2: Mount the NFS Share on the Client

sudo apt update
sudo apt install nfs-common -y
sudo mkdir -p /mnt/nfs/share
sudo mount -t nfs <nfs-server-ip>:/srv/nfs/share /mnt/nfs/share

Step 3: Permanent Mount (Optional)

echo "<nfs-server-ip>:/srv/nfs/share /mnt/nfs/share nfs defaults 0 0" >> /etc/fstab
sudo mount -a

Pandas

vocabularies

Data StructureDimensionsDescription
Series11D labeled homogeneous array, sizeimmutable.
Data Frames2General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.
Panel3General 3D labeled, size-mutable array.


create data frame

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

#change columns names
df=pd.DataFrame([[1,2,3],[4,5,6]],columns=['a','b','c'],index=['A','B'])
df.columns = ['x','y','z']

#index rows instead of numbers
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])


#read csv
df = pd.read_csv('data.csv')

#read json 
df = pd.read_json('data.json')

#series
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])

basic properties and functions

1axes Returns a list of the row axis labels
2dtype Returns the dtype of the object.
3empty Returns True if series is empty.
4ndim Returns the number of dimensions of the underlying data, by definition 1.
5size Returns the number of elements in the underlying data.
6values Returns the Series as ndarray.
7head() Returns the first n rows.
8tail() Returns the last n rows.
9columns:[] to edit or get columns

read data frame

1.loc()Label based
2.iloc()Integer based
3.ix()Both Label and Integer based
#by row index
df.loc[0]
print(df.loc[[0, 1]])
df =pd.DataFrame({'good':[1,2,3],'bad':[4,None,6]})
print(df.loc[1,"bad"])#nan object
print(df.iloc[0:2, 0:2])#all data frame
#ix
df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# Integer slicing
print df.ix[:4]
print df.ix[:,'A']

Iterators

#iterate by columns then inside Series(Values)
df=pd.DataFrame([[1,2,3],[4,5,6]],columns=['a','b','c'],index=['A','B'])
df.columns = ['x','y','z']
for col in df:
    print(col)
    print(df[col].dtypes)
    for val in df[col]:
        print(val)


#iterate by rows
for row_index,row in df.iterrows():
   print row_index,row

#iterate by tuples
for index,*values in df.itertuples():
    print(index)
    print(values[0])
    print(values[1])
    print(values[2])

sort

#index sort
import pandas as pd
import numpy as np

unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
   mns = ['col2','col1'])

sorted_df=unsorted_df.sort_index()
print sorted_df

#values sort
import pandas as pd
import numpy as np

unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
   sorted_df = unsorted_df.sort_values(by=['col1','col2'])

print sorted_df

edit data frame


df=pd.DataFrame({'A': [1, 2, 3, 4, 5],"B":[1, 2, 3, 4, 5]})
df['B']=df['B'].apply(lambda x: math.pow(x,2))
df["B"]=[[math.sqrt(x),x/2] for x in df["B"]]

merge

  • left − A DataFrame object.
  • right − Another DataFrame object.
  • on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
  • left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
  • right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
  • left_index − If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.
  • right_index − Same usage as left_index for the right DataFrame.
  • how − One of ‘left’, ‘right’, ‘outer’, ‘inner’. Defaults to inner. Each method has been described below.
  • sort − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.
import pandas as pd
left = pd.DataFrame({
   'id':[1,2,3,4,5],
   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
   'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
   {'id':[1,2,3,4,5],
   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
   'subject_id':['sub2','sub4','sub3','sub6','sub5']})

Concatenation

one = pd.DataFrame({
   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
   'subject_id':['sub1','sub2','sub4','sub6','sub5'],
   'Marks_scored':[98,90,87,69,78]},
   index=[1,2,3,4,5])

two = pd.DataFrame({
   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
   'subject_id':['sub2','sub4','sub3','sub6','sub5'],
   'Marks_scored':[89,80,79,97,88]},
   index=[1,2,3,4,5])
print pd.concat([one,two],keys=['x','y'],ignore_index=True)

cut continues values to classes

df = pd.DataFrame({'number': [1,2,3,4,5,6,7,8,9,10,11]})
df['bins'] = pd.cut(df['number'], (0, 5, 8, 11), 
                    labels=['low', 'medium', 'high'])

oop in python

magic functions

class tt:
    text:str
    def __init__(self,value):
        self.text = value
    def __str__(self):#print(obj)
        return self.text
    def __repr__(self):#print(obj) return obj.text
        return self.text
    def __add__(self, other):#obj+other return obj.text+other.text
        return self.text+other.text
    def __len__(self):#len(obj)  return len(obj.text)
        return len(self.text)
    def __getattr__(self, item):#obj.item return item
        return item
    def __getitem__(self, item):#obj[item] return item
        return item
    def __call__(self, *args, **kwargs):#obj() return obj.text
        return self.text
    def __eq__(self, other):#obj==other return obj.text==other.text
        return self.text==other.text
    def __ne__(self, other):#obj!=other return obj.text!=other.text
        return self.text!=other.text
    def __iter__(self):#for i in obj: return obj.text
        return iter(self.text)
class A:
    def __getitem__(self, item):
        if(isinstance(item, slice)):
            print(item.start)
            print(item.stop)
            print(item.step)

a = A()
a[1:3:4]

generic class

from typing import TypeVar, Generic

T = TypeVar('T',int , float,complex,decimal.Decimal)
class Stack(Generic[T]):
    def __init__(self) -> None:
        # Create an empty list with items of type T
        self.items: list[T] = []

    def push(self, item: T) -> None:
        self.items.append(item)

    def pop(self) -> T:
        return self.items.pop()

    def empty(self) -> bool:
        return not self.items

predefined static properties

  • __dict__ − Dictionary containing the class’s namespace.
  • __doc__ − Class documentation string or none, if undefined.
  • __name__ − Class name.
  • __module__ − Module name in which the class is defined. This attribute is “__main__” in interactive mode.
  • __bases__ − A possibly empty tuple containing the base classes, in the order of their occurrence in the base class list.
class Employee:
   def __init__(self, name="Bhavana", age=24):
      self.name = name
      self.age = age
   def displayEmployee(self):
      print ("Name : ", self.name, ", age: ", self.age)

print ("Employee.__doc__:", Employee.__doc__)
print ("Employee.__name__:", Employee.__name__)
print ("Employee.__module__:", Employee.__module__)
print ("Employee.__bases__:", Employee.__bases__)
print ("Employee.__dict__:", Employee.__dict__ )

abstraction

from abc import ABC, abstractmethod
class demo(ABC):
   @abstractmethod
   def method1(self):
      print ("abstract method")
      return
   def method2(self):
      print ("concrete method")

access modifier

class Employee:
   def __init__(self, name, age, salary):
      self.name = name # public variable
      self.__age = age # private variable
      self._salary = salary # protected variable
   def displayEmployee(self):
      print ("Name : ", self.name, ", age: ", self.__age, ", salary: ", self._salary)

e1=Employee("Bhavana", 24, 10000)

print (e1.name)
print (e1._salary)
print (e1.__age)

enum

from enum import Enum

class subjects(Enum):
   ENGLISH = "E"
   MATHS = "M"
   GEOGRAPHY = "G"
   SANSKRIT = "S"
   
obj = subjects.SANSKRIT
print (type(obj), obj.name, obj.value)#<enum 'subjects'> SANSKRIT S

from enum import Enum, unique

@unique
class subjects(Enum):
   ENGLISH = 1
   MATHS = 2
   GEOGRAPHY = 3
   SANSKRIT = 2#error value duplicated

reflections

class test:
   pass
   
obj = test()
print (type(obj))#<class '__main__.test'>

print (isinstance(10, int))#true
print (isinstance(2.56, float))#true
print (isinstance(2+3j, complex))#true
print (isinstance("Hello World", str))#true

def test():
   pass
   
print (callable("Hello"))
print (callable(abs))#true
print (callable(list.clear([1,2])))
print (callable(test))#true

class test:
   def __init__(self):
      self.name = "Manav"
      
obj = test()
print (getattr(obj, "name"))

class test:
   def __init__(self):
      self.name = "Manav"
      
obj = test()
setattr(obj, "age", 20)
setattr(obj, "name", "Madhav")
print (obj.name, obj.age)

class test:
   def __init__(self):
      self.name = "Manav"
      
obj = test()
print (hasattr(obj, "age"))
print (hasattr(obj, "name"))

dir(object) # get list of attributes belong to the object

static methods

class A:
    value: int=14 #static variable you can access it by A.value
    @staticmethod
    def bar():#static method
        print("bar")

other way to call method

class A:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def mm(self):
        return self.x, self.y
fun=A.mm(A(1,2))
print(fun)  # (1, 2)

setter and getter

# employee.py

from datetime import date

class Employee:
    def __init__(self, name, birth_date, start_date):
        self.name = name
        self.birth_date = birth_date
        self.start_date = start_date

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, value):
        self._name = value.upper()

    @property
    def birth_date(self):
        return self._birth_date

    @birth_date.setter
    def birth_date(self, value):
        self._birth_date = date.fromisoformat(value)

    @property
    def start_date(self):
        return self._start_date

    @start_date.setter
    def start_date(self, value):
        self._start_date = date.fromisoformat(value)

collections and iterations

  • Set: Its unique feature is that items are either members or not. This means duplicates are ignored:
  • Mutable set: The set collection
  • Immutable set: The frozenset collection
  • Sequence: Its unique feature is that items are provided with an index position:
  • Mutable sequence: The list collection
  • Immutable sequence: The tuple collection
  • Mapping: Its unique feature is that each item has a key that refers to a value:
  • Mutable mapping: The dict collection.
  • Immutable mapping: Interestingly, there’s no built-in frozen mapping.
  • [:]: The start and stop are implied. The expression S[:] will create a copy of sequence S.
  • [:stop]: This makes a new list from the beginning to just before the stop value.
  • [start:]: This makes a new list from the given start to the end of the sequence.
  • [start:stop]: This picks a sublist, starting from the start index and stopping just before the stop index. Python works with half-open intervals. The start is included, while the end is not included.
  • [::step]: The start and stop are implied and include the entire sequence. The step—generally not equal to one—means we’ll skip through the list from the start using the step. For a given
  • [start::step]: The start is given, but the stop is implied. The idea is that the start is an offset, and the step applies to that offset. For a given start, a, step, s, and a list of size |L|.
  • [:stop:step]: This is used to prevent processing the last few items in a list. Since the step is given, processing begins with element zero.
  • [start:stop:step]: This will pick elements from a subset of the sequence. Items prior to start and at or after stop will not be used.
a = slice(1, 2, 3)#[start:stop:step]

List

list1 = [1, 2, 3, 4, 5]
list2 = [6, 7, 8, 9, 10]
#operations
print(list1 + list2)  # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(list1 * 2)  # [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
#append
list1.append(6)  # [1, 2, 3, 4, 5, 6]
list1.extend([7, 3, 3])  # [1, 2, 3, 4, 5, 6, 7, 3, 3]
#remove
del list1[0]  # [2, 3, 4, 5, 6, 7, 3, 3]
list1.remove(3)  # [2, 4, 5, 6, 7, 3, 3]
list1.pop(0)  # [4, 5, 6, 7, 3, 3]
#insert
list1.insert(0, 1)  # [1, 4, 5, 6, 7, 3, 3]
#sort
list1.sort()  # [1, 3, 3, 4, 5, 6, 7]
list1.sort(reverse=True)  # [7, 6, 5, 4, 3, 3, 1]
list1.sort(key=lambda x: x % 2)  # [6, 4, 7, 5, 3, 3, 1]
#reverse
list1.reverse()  # [1, 3, 3, 4, 5, 6, 7]
#count
print(list1.count(3))  # 2
#clear
list1.clear()  # []
#copy
list1 = [1, 2, 3, 4, 5]
list2 = list1.copy()
#unpack the list
list3=[*list1]
print(*arr)# print(1,2,3,4,5,6,7,8,9,10)
ls=list(range(10))
random.shuffle(ls)
mm:typing.Callable=ls.append
mm(10)
print(ls)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

dect

Only a number, string or tuple can be used as key. All of them are immutable. You can use an object of any type as the value.

dict[key]Extract/assign the value mapped with keyprint (d1[‘b’]) retrieves 4d1[‘b’] = ‘Z’ assigns new value to key ‘b’
dict1|dict2Union of two dictionary objects, retuning new objectd3=d1|d2 ; print (d3){‘a’: 2, ‘b’: 4, ‘c’: 30, ‘a1’: 20, ‘b1’: 40, ‘c1’: 60}
dict1|=dict2Augmented dictionary union operatord1|=d2; print (d1){‘a’: 2, ‘b’: 4, ‘c’: 30, ‘a1’: 20, ‘b1’: 40, ‘c1’: 60}
d1=dict([('a', 100), ('b', 200)])
d2 = dict((('a', 'one'), ('b', 'two')))
d3=dict(a= 100, b=200)
d4 = dict(a='one', b='two')

update dictionary

marks = {"Savita":67, "Imtiaz":88, "Laxman":91, "David":49}
print ("marks dictionary before update: \n", marks)
marks1 = {"Sharad": 51, "Mushtaq": 61, "Laxman": 89}
marks.update(marks1)
print (marks)#{'Savita': 67, 'Imtiaz': 88, 'Laxman': 89, 'David': 49, 'Sharad': 51, 'Mushtaq': 61}

d1.update(k1=v1, k2=v2)

unpack dictionary

marks = {"Savita":67, "Imtiaz":88, "Laxman":91, "David":49}
marks1 = {"Sharad": 51, "Mushtaq": 61, "Laxman": 89}
newmarks = {**marks, **marks1}#{'Savita': 67, 'Imtiaz': 88, 'Laxman': 89, 'David': 49, 'Sharad': 51, 'Mushtaq': 61}
a,*b=[1,2,3]#1 [2, 3]

delete and remove

del dict['key']
val = dict.pop(key)
dict.clear()
1dict.clear()Removes all elements of dictionary dict.
2dict.copy()Returns a shallow copy of dictionary dict.
3dict.fromkeys()Create a new dictionary with keys from seq and values set to value.
4dict.get(key, default=None)For key key, returns value or default if key not in dictionary.
5dict.has_key(key)Returns true if a given key is available in the dictionary, otherwise it returns a false.
6dict.items()Returns a list of dict’s (key, value) tuple pairs.
7dict.keys()Returns list of dictionary dict’s keys.
8dict.pop()Removes the element with specified key from the collection
9dict.popitem()Removes the last inserted key-value pair
10dict.setdefault(key, default=None)Similar to get(), but will set dict[key]=default if key is not already in dict.
11dict.update(dict2)Adds dictionary dict2’s key-values pairs to dict.
12dict.values()Returns list of dictionary dict’s values.

syntaxes

for else statement

for i in arr:
    if i>3:
        break
    print(i)
else: #executes if the loop completes without a break
    print("done")
T1 = (10,20,30,40)
T2 = ('one', 'two', 'three', 'four')
L1, L2 = list(T1), list(T2)
L3 = tuple(y for x in [L1, L2] for y in x)#(10, 20, 30, 40, 'one', 'two', 'three', 'four')
mm=["a","b","c",1,2,3]
for m in filter(lambda x: isinstance(x, str), mm):
    print(m)
for m,v in vars(tt("a")).items():#vars return all attributes of object
    print(m,v)
ls=[1,2,3,4,5,6,7,8,9,10,11]
even= [i for i in ls if i%2==0]#only even numbers

builtin functions

numbers = [1.0, 1.5, 2.0, 2.5]
result_1 = map(lambda x: x.is_integer(), numbers)
result_2 = (x.is_integer() for x in numbers)
result_3 = map(float.is_integer, numbers)
# [True, False, True, False]

filter(lambda x: x>10, numbers)

len(numbers)

numericList = [12, 24, 36, 48, 60, 72, 84]
print(sorted(numericList, reverse=True))#84,72,60....

sum([1,2,3,4])#10
a,*b=[1,2,3]#1 [2, 3]

Iterable and Iterator

from typing import *
m=[1,2,3]
mm=iter(m)
print(next(mm))#1
print(next(m))#error
print(isinstance(iter(m), Iterator))#True
print(isinstance(m, Iterator))#False
print(isinstance(m, Iterable))#True
print(isinstance(iter(m), Iterable))#True

map object

ls=list(range(1,100,2))
ls2=[m for m in ls if m>10 & m<20]
m1=map(lambda x:x-100+3j,ls2)
ls3=list(m1)

Python Basics

built-in types

Numericint, float, complex
Stringstr (text sequence type)
Sequencelist, tuple, range
Binarybytes, bytearray, memoryview
Mappingdict
Booleanbool
Setset, frozenset
NoneNoneType
#boolean
print(True==1)  # True
print(False==0)  # True
print(True==2)  # False
print(False==2)  # False
#numeric int,float,complex
f=1.5#float
i=12#int
c=3+4j#complex
#sequence list,tuple,range,bytes,str
str_t="hello"
bytes_t=b"bytes"
list_t=[1,2,3]
tuple_t=(1,2,3,4)

#set and dect
set_t={3,2,3,1,4}
dect_t={"a":3,1:4}

Numerical Operations

from decimal import Decimal
from fractions import Fraction
#math for real numbers, cmath for complex numbers
import math, cmath

#fractions
f=Fraction(0.54)
print(f.numerator, f.denominator)#607985949695017 1125899906842624
f=Fraction(0.54).limit_denominator(1000)
print(f.numerator, f.denominator)#27 50
f=Fraction("22/7")
print(f.numerator, f.denominator)#22 7
# converting to float
print(float(f))#3.142857142857143
# converting to int
print(int(f))#3

#complex numbers
a=67j#complex number complex(0,67)
##converting to polar form
r=abs(a)#67

#decimal
d=Decimal("0.54")
print(d)#0.54

#rounding
print(round(0.54,1))#0.5
print(round(0.54,0))#1.0
#floor and ceil
print(math.floor(0.54))#0
print(math.ceil(0.54))#1

#operations
print(0.54+0.46)#1.0
print(0.54-0.46)#0.08
print(0.54*0.46)#0.2484
print(0.54/0.46)#1.173913043478261
print(0.54//0.46)#1.0
print(round(0.54%0.46,3))#0.08
print(0.54**0.46)#0.7913832183656556

(19/155)*(155/19) #0.9999999999999999
round((19/155)*(155/19)) #1
(19/155)*(155/19) == 1.0#False
math.isclose((19/155)*(155/19), 1)#True

value= 0x110 #0b10001000

import random
rand_val = random.randint(1, 100)
print(random.randrange(0, 100, 5))# 0, 5, 10...95
print(random.randint(1, 100))# 1, 2, 3...100

string

In Python, single-quoted strings and double-quoted strings are the same. This PEP does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string

#single line
m:str="hello world" 
#multi lines
m2:str="""hello world
my name is python""" 

m_digit:str="124"
print(m_digit.isnumeric())# True
print(m_digit.isalpha())# False

regular expressions

import re

string:str = 'hello 12 hi 89. Howdy 34'
regex:re= re.compile(r'(\d+)')
result = regex.findall(string)#['12', '89', '34']

regex2:re= re.compile(r'(\d+) hi')
result=regex2.search(string)
print(result.group(0),result.start(0),result.end(0))#12 hi 6 11
print(result.group(1),result.start(1),result.end(1))#12 6 8

string:str = 'hello 12 hi 89. Howdy 34'
result = re.sub(r'\d+', '', string)# remove all digits

formatted strings

f'{id:s}  : {location:s} : {max_temp:d} / {min_temp:d} / {precipitation:f}'
f'{id:3d}  : {location:19s} : {max_temp:3d} / {min_temp:3d} / {precipitation:5.2f}' #'IAD  : Dulles Intl Airport :   32 /  13 /  0.40'

value= 42
string:str = f'{value} {2**7+1}'# 42 129
string:str = f'{value=} {2**7+1=}'# value=42 2**7+1=129

string:str = f'{value:b} {0.2:2.0%}'# 10001000 20%

m="hello %s %s %s" % ("world", "!" , 1)

string module

import string
value="Hello, World!"
print(value.translate(str.maketrans('','',string.punctuation+string.whitespace)))#HelloWorld

single byte sequence b’hello world’

value="محمد العبدلي"
print(value.encode('utf-8', 'ignore'))#b'\xd9\x85\xd8\xad\xd9\x85\xd8\xaf \xd8\xa7\xd9\x84\xd8\xb9\xd8\xa8\xd8\xaf\xd9\x84\xd9\x8a'
print(len(value.encode('utf-8', 'ignore')))#23

Tuples

m:tuple =(1,23,4,5,6,7,5,9,10)
if(m.__class__==tuple):
    print("True")
print(len(m))#9
print(m.count(5))#2
print(m.index(5))#3

v1,v2,v3,v4,v5,v6,v7,v8,v9 = m
print(v1,v2,v3,v4,v5,v6,v7,v8,v9)#1 23 4 5 6 7 5 9 10

Handling Exceptions

def sum_n(n: int) -> int:
    if(n < 0):
        raise Exception("n must be a positive integer")
    s = 0
    for i in range(1, n+1):
        s += i
    return s


try:
    total=sum_n(-1)
    print(total)
except Exception as e:
    print(e)
     try:
         target = shutil.copy(source_path, target_path)
     except FileNotFoundError:
         try:
             target_path.parent.mkdir(exist_ok=True, parents=True)
             target = shutil.copy(source_path, target_path)
         except OSError as ex2:
             print(f"{target_path.parent} problem: {ex2}")
    except OSError as ex:
         print(f"Copy {source_path} to {target_path} error {ex}")

Functions

def hex2rgb(hx_int):
    if isinstance(hx_int, str):
        if hx_int [0] == "#":
            hx_int = int(hx_int [1:], 16)
        else:
            hx_int = int(hx_int, 16)
    r, g, b = (hx_int >> 16) & 0xff, (hx_int >> 8) & 0xff, hx_int & 0xff
    return r, g, b

#or 

def hex2rgb(hx_int: Union[int, str]) -> Tuple[int, int, int]:
    if isinstance(hx_int, str):
        if hx_int[0] == "#":
            hx_int = int(hx_int[1:], 16)
        else:
            hx_int = int(hx_int, 16)
    r, g, b = (hx_int >> 16)&0xff, (hx_int >> 8)&0xff, hx_int&0xff
    return r, g, b

RGB = Tuple[int, int, int]
HSL = Tuple[float, float, float]
def rgb_to_hsl(color: RGB) -> HSL:
def hsl_complement(color: HSL) -> HSL:
def hsl_to_rgb(color: HSL) -> RGB:
from typing import Union
from decimal import Decimal
number=Union[int,float,Decimal,complex]
def add(a:number,b:number)->number:
    return a+b
add(14j,5.5)
def dice_t(n: int, sides: int = 6) -> Tuple[int, ...]:
    return tuple(random.randint(1, sides) for _ in range(n))
  • * is used as a prefix for a special parameter that receives all the unmatched positional arguments. We often use *args to collect all of the positional arguments into a single parameter named args.
  • ** is used a prefix for a special parameter that receives all the unmatched named arguments. We often use **kwargs to collect the named values into a parameter named kwargs.
  • *, when used by itself as a separator between parameters, separates those parameters. It can be applied positionally or by keyword. The remaining parameters can only be provided by keyword.
def mm(*args):
    print(args)
mm(1,2,3,4)#(1, 2, 3, 4)

def mm(**args):
    print(args)
mm(a=1,b=2,c=3)# {'a': 1, 'b': 2, 'c': 3}

def mm(*, x, y):
    return x, y
print(mm(x=1, y=2))  # (1, 2)

iterable function

def fibo_iter() -> typing.Iterable[typing.Tuple[int, int]]:
    a = 1
    b = 1
    while True:
        yield (a, b)
        a, b = b, a + b

for i, f in fibo_iter():
    if i >= 10:
        break
    print(f, end=' ')

recursion

def factorial(n: int) -> int:
    if(n>0):
        return factorial(n-1)*n
    else:
        return 1
print(factorial(5))  # 120

lambda

lamb= lambda x: x**2

typing module

v:typing.Union[int,float] = 1.0#union for merge many types in one type
print(isinstance(v,typing.Union[float,set,dict])) #true

#object is super of all classes
m = 12
print(isinstance(m,object))#true
print(issubclass(int,object))#true

callable:typing.Callable= lambda m: m+12
print(callable(12),isinstance(callable,typing.Callable))#24 true

Vector: TypeAlias = list[float]

UserId = NewType('UserId', int)
some_id = UserId(524313)

Decorators

def decorator(func):
    def wrapper(*args, **kwargs):
        print(args, kwargs)
        print('Before function')
        func(*args, **kwargs)
        print('After function')
    return wrapper

@decorator
def say_hello(name):
    print(f'Hello {name}')
type alias
type tt=typing.Iterable[int]

with keyword

class MyContextManager:
    def __enter__(self):
        print("Entering the context.")
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        print("Exiting the context.")
        if exc_type:
            print(f"Exception: {exc_value}")
        return True  # Suppress exception

with MyContextManager():
    print("Inside the context.")
    raise ValueError("An error occurred!")
with open('example.txt', 'r') as file:
    content = file.read()
#----------------------
import threading

lock = threading.Lock()

# Without `with`
lock.acquire()
try:
    # Critical section of code
    print("Lock acquired manually.")
finally:
    lock.release()

# With `with`
with lock:
    # Critical section of code
    print("Lock acquired using 'with'.")