Cursuri de pregatire Big Data Business Intelligence for Criminal Intelligence Analysis

ID de curs

bigdatabicriminal

Durata

35 ore (usually 5 days including breaks)

Cerințe

  • Knowledge of law enforcement processes and data systems
  • Basic understanding of SQL/Oracle or relational database
  • Basic understanding of statistics (at Spreadsheet level)

Sinoptic

Avansele tehnologiilor și cantitatea din ce în ce mai mare de informații transformă modul în care se execută aplicarea legii. Provocările pe care Big Data prezintă sunt la fel de descurajante ca promisiunea Big Data . Stocarea eficientă a datelor este una dintre aceste provocări; analiza eficientă este alta.

În cadrul acestui training, instruit în direct, participanții vor învăța mentalitatea cu care să abordeze tehnologiile Big Data , să evalueze impactul lor asupra proceselor și politicilor existente și să implementeze aceste tehnologii în scopul identificării activității infracționale și prevenirii criminalității. Studiile de caz de la organizațiile de aplicare a legii din întreaga lume vor fi examinate pentru a obține informații despre abordările, provocările și rezultatele adoptării lor.

Până la sfârșitul acestui antrenament, participanții vor putea:

  • Combinați tehnologia Big Data cu procesele tradiționale de colectare a datelor pentru a crea împreună o poveste în timpul unei investigații
  • Implementați soluții industriale de stocare și prelucrare a datelor pentru analiza datelor
  • Pregătiți o propunere pentru adoptarea celor mai adecvate instrumente și procese care să permită o abordare bazată pe date privind cercetarea penală

Public

  • Specialiștii în aplicarea legii, cu pregătire tehnică

Formatul cursului

  • Partea de prelegere, o discuție parțială, exerciții și practici practice

Machine Translated

Schița de curs

=====
Day 01
=====
Overview of Big Data Business Intelligence for Criminal Intelligence Analysis

  • Case Studies from Law Enforcement - Predictive Policing
  • Big Data adoption rate in Law Enforcement Agencies and how they are aligning their future operation around Big Data Predictive Analytics
  • Emerging technology solutions such as gunshot sensors, surveillance video and social media
  • Using Big Data technology to mitigate information overload
  • Interfacing Big Data with Legacy data
  • Basic understanding of enabling technologies in predictive analytics
  • Data Integration & Dashboard visualization
  • Fraud management
  • Business Rules and Fraud detection
  • Threat detection and profiling
  • Cost benefit analysis for Big Data implementation

Introduction to Big Data

  • Main characteristics of Big Data -- Volume, Variety, Velocity and Veracity.
  • MPP (Massively Parallel Processing) architecture
  • Data Warehouses – static schema, slowly evolving dataset
  • MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica etc.
  • Hadoop Based Solutions – no conditions on structure of dataset.
  • Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS
  • Apache Spark for stream processing
  • Batch- suited for analytical/non-interactive
  • Volume : CEP streaming data
  • Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc)
  • Less production ready – Storm/S4
  • NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database

NoSQL solutions

  • KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
  • KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
  • KV Store (Hierarchical) - GT.m, Cache
  • KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
  • KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
  • Tuple Store - Gigaspaces, Coord, Apache River
  • Object Database - ZopeDB, DB40, Shoal
  • Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
  • Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

Varieties of Data: Introduction to Data Cleaning issues in Big Data

  • RDBMS – static structure/schema, does not promote agile, exploratory environment.
  • NoSQL – semi structured, enough structure to store data without exact schema before storing data
  • Data cleaning issues

Hadoop

  • When to select Hadoop?
  • STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
  • SEMI STRUCTURED data – difficult to carry out using traditional solutions (DW/DB)
  • Warehousing data = HUGE effort and static even after implementation
  • For variety & volume of data, crunched on commodity hardware – HADOOP
  • Commodity H/W needed to create a Hadoop Cluster

Introduction to Map Reduce /HDFS

  • MapReduce – distribute computing over multiple servers
  • HDFS – make data available locally for the computing process (with redundancy)
  • Data – can be unstructured/schema-less (unlike RDBMS)
  • Developer responsibility to make sense of data
  • Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS

=====
Day 02
=====
Big Data Ecosystem -- Building Big Data ETL (Extract, Transform, Load) -- Which Big Data Tools to use and when?

  • Hadoop vs. Other NoSQL solutions
  • For interactive, random access to data
  • Hbase (column oriented database) on top of Hadoop
  • Random access to data but restrictions imposed (max 1 PB)
  • Not good for ad-hoc analytics, good for logging, counting, time-series
  • Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access)
  • Flume – Stream data (e.g. log data) into HDFS

Big Data Management System

  • Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services
  • Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
  • Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari
  • In Cloud : Whirr

Predictive Analytics -- Fundamental Techniques and Machine Learning based Business Intelligence

  • Introduction to Machine Learning
  • Learning classification techniques
  • Bayesian Prediction -- preparing a training file
  • Support Vector Machine
  • KNN p-Tree Algebra & vertical mining
  • Neural Networks
  • Big Data large variable problem -- Random forest (RF)
  • Big Data Automation problem – Multi-model ensemble RF
  • Automation through Soft10-M
  • Text analytic tool-Treeminer
  • Agile learning
  • Agent based learning
  • Distributed learning
  • Introduction to Open source Tools for predictive analytics : R, Python, Rapidminer, Mahut

Predictive Analytics Ecosystem and its application in Criminal Intelligence Analysis

  • Technology and the investigative process
  • Insight analytic
  • Visualization analytics
  • Structured predictive analytics
  • Unstructured predictive analytics
  • Threat/fraudstar/vendor profiling
  • Recommendation Engine
  • Pattern detection
  • Rule/Scenario discovery – failure, fraud, optimization
  • Root cause discovery
  • Sentiment analysis
  • CRM analytics
  • Network analytics
  • Text analytics for obtaining insights from transcripts, witness statements, internet chatter, etc.
  • Technology assisted review
  • Fraud analytics
  • Real Time Analytic

=====
Day 03
=====
Real Time and Scalable Analytics Over Hadoop

  • Why common analytic algorithms fail in Hadoop/HDFS
  • Apache Hama- for Bulk Synchronous distributed computing
  • Apache SPARK- for cluster computing and real time analytic
  • CMU Graphics Lab2- Graph based asynchronous approach to distributed computing
  • KNN p -- Algebra based approach from Treeminer for reduced hardware cost of operation

Tools for eDiscovery and Forensics

  • eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance
  • Predictive coding and Technology Assisted Review (TAR)
  • Live demo of vMiner for understanding how TAR enables faster discovery
  • Faster indexing through HDFS – Velocity of data
  • NLP (Natural Language processing) – open source products and techniques
  • eDiscovery in foreign languages -- technology for foreign language processing

Big Data BI for Cyber Security – Getting a 360-degree view, speedy data collection and threat identification

  • Understanding the basics of security analytics -- attack surface, security misconfiguration, host defenses
  • Network infrastructure / Large datapipe / Response ETL for real time analytic
  • Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data

Gathering disparate data for Criminal Intelligence Analysis

  • Using IoT (Internet of Things) as sensors for capturing data
  • Using Satellite Imagery for Domestic Surveillance
  • Using surveillance and image data for criminal identification
  • Other data gathering technologies -- drones, body cameras, GPS tagging systems and thermal imaging technology
  • Combining automated data retrieval with data obtained from informants, interrogation, and research
  • Forecasting criminal activity

=====
Day 04
=====
Fraud prevention BI from Big Data in Fraud Analytics

  • Basic classification of Fraud Analytics -- rules-based vs predictive analytics
  • Supervised vs unsupervised Machine learning for Fraud pattern detection
  • Business to business fraud, medical claims fraud, insurance fraud, tax evasion and money laundering

Social Media Analytics -- Intelligence gathering and analysis

  • How Social Media is used by criminals to organize, recruit and plan
  • Big Data ETL API for extracting social media data
  • Text, image, meta data and video
  • Sentiment analysis from social media feed
  • Contextual and non-contextual filtering of social media feed
  • Social Media Dashboard to integrate diverse social media
  • Automated profiling of social media profile
  • Live demo of each analytic will be given through Treeminer Tool

Big Data Analytics in image processing and video feeds

  • Image Storage techniques in Big Data -- Storage solution for data exceeding petabytes
  • LTFS (Linear Tape File System) and LTO (Linear Tape Open)
  • GPFS-LTFS (General Parallel File System -  Linear Tape File System) -- layered storage solution for Big image data
  • Fundamentals of image analytics
  • Object recognition
  • Image segmentation
  • Motion tracking
  • 3-D image reconstruction

Biometrics, DNA and Next Generation Identification Programs

  • Beyond fingerprinting and facial recognition
  • Speech recognition, keystroke (analyzing a users typing pattern) and CODIS (combined DNA Index System)
  • Beyond DNA matching: using forensic DNA phenotyping to construct a face from DNA samples

Big Data Dashboard for quick accessibility of diverse data and display :

  • Integration of existing application platform with Big Data Dashboard
  • Big Data management
  • Case Study of Big Data Dashboard: Tableau and Pentaho
  • Use Big Data app to push location based services in Govt.
  • Tracking system and management

=====
Day 05
=====
How to justify Big Data BI implementation within an organization:

  • Defining the ROI (Return on Investment) for implementing Big Data
  • Case studies for saving Analyst Time in collection and preparation of Data – increasing productivity
  • Revenue gain from lower database licensing cost
  • Revenue gain from location based services
  • Cost savings from fraud prevention
  • An integrated spreadsheet approach for calculating approximate expenses vs. Revenue gain/savings from Big Data implementation.

Step by Step procedure for replacing a legacy data system with a Big Data System

  • Big Data Migration Roadmap
  • What critical information is needed before architecting a Big Data system?
  • What are the different ways for calculating Volume, Velocity, Variety and Veracity of data
  • How to estimate data growth
  • Case studies

Review of Big Data Vendors and review of their products.

  • Accenture
  • APTEAN (Formerly CDC Software)
  • Cisco Systems
  • Cloudera
  • Dell
  • EMC
  • GoodData Corporation
  • Guavus
  • Hitachi Data Systems
  • Hortonworks
  • HP
  • IBM
  • Informatica
  • Intel
  • Jaspersoft
  • Microsoft
  • MongoDB (Formerly 10Gen)
  • MU Sigma
  • Netapp
  • Opera Solutions
  • Oracle
  • Pentaho
  • Platfora
  • Qliktech
  • Quantum
  • Rackspace
  • Revolution Analytics
  • Salesforce
  • SAP
  • SAS Institute
  • Sisense
  • Software AG/Terracotta
  • Soft10 Automation
  • Splunk
  • Sqrrl
  • Supermicro
  • Tableau Software
  • Teradata
  • Think Big Analytics
  • Tidemark Systems
  • Treeminer
  • VMware (Part of EMC)

Q/A session

Mărturii

★★★★★
★★★★★

Categorii înrudite

Cursuri înrudite

Reduceri pentru cursuri

Newsletter Oferte Cursuri

Respectăm confidențialitatea adresei dvs. de email. Nu vom transmite sau vinde adresa altor părți. Puteți să schimbați preferințele sau să vă dezabonați complet în orice moment.

Câțiva dintre clienții noștri

is growing fast!

We are looking for a good mixture of IT and soft skills in Romania!

As a NobleProg Trainer you will be responsible for:

  • delivering training and consultancy Worldwide
  • preparing training materials
  • creating new courses outlines
  • delivering consultancy
  • quality management

At the moment we are focusing on the following areas:

  • Statistic, Forecasting, Big Data Analysis, Data Mining, Evolution Alogrithm, Natural Language Processing, Machine Learning (recommender system, neural networks .etc...)
  • SOA, BPM, BPMN
  • Hibernate/Spring, Scala, Spark, jBPM, Drools
  • R, Python
  • Mobile Development (iOS, Android)
  • LAMP, Drupal, Mediawiki, Symfony, MEAN, jQuery
  • You need to have patience and ability to explain to non-technical people

To apply, please create your trainer-profile by going to the link below:

Apply now!

This site in other countries/regions