US20140214844A1 - Multiple classification models in a pipeline - Google Patents
Multiple classification models in a pipeline Download PDFInfo
- Publication number
- US20140214844A1 US20140214844A1 US13/756,450 US201313756450A US2014214844A1 US 20140214844 A1 US20140214844 A1 US 20140214844A1 US 201313756450 A US201313756450 A US 201313756450A US 2014214844 A1 US2014214844 A1 US 2014214844A1
- Authority
- US
- United States
- Prior art keywords
- classification
- new product
- classification model
- successive
- models
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000013145 classification model Methods 0.000 title claims abstract description 188
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000012552 review Methods 0.000 claims description 16
- 238000007477 logistic regression Methods 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 claims 3
- 238000004590 computer program Methods 0.000 abstract description 2
- 238000004891 communication Methods 0.000 description 11
- 230000005540 biological transmission Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 239000011449 brick Substances 0.000 description 4
- 239000004570 mortar (masonry) Substances 0.000 description 4
- 239000000654 additive Substances 0.000 description 3
- 230000000996 additive effect Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012937 correction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000013403 standard screening design Methods 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 238000001444 catalytic combustion detection Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G06F17/30598—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
Definitions
- Retailers often have databases and warehouses full of thousands upon thousands of products offered for sale, with new products being offered every day.
- the databases must be updated with these new products in an organized and usable manner.
- Each product and new product should be categorized within the database so that it can be found by customers for purchase or employees for stocking.
- the large number of products offered for sale by a merchant makes updating a merchant's product database difficult and costly with current methods and systems.
- FIG. 1 illustrates an example block diagram of a computing device
- FIG. 2 illustrates an example computer architecture that facilitates different implementations described herein
- FIG. 3 illustrates a flow chart of an example method according to one implementation
- FIG. 4 illustrates a flow chart of an example method according to one implementation
- FIG. 5 illustrates a flow chart of an example method according to one implementation.
- Implementations of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
- Computer storage media includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
- SSDs solid state drives
- PCM phase-change memory
- a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
- a network or another communications connection can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
- computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system.
- RAM can also include solid state drives (SSDs or PCIx based real time memory tiered Storage, such as FusionIO).
- SSDs solid state drives
- PCIx based real time memory tiered Storage such as FusionIO
- Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
- the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
- the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like. It should be noted that any of the above mentioned computing devices may be provided by or located within a brick and mortar location.
- the disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
- program modules may be located in both local and remote memory storage devices.
- cloud computing is defined as a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned via virtualization and released with minimal management effort or service provider interaction, and then scaled accordingly.
- configurable computing resources e.g., networks, servers, storage, applications, and services
- a cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, or any suitable characteristic now known to those of ordinary skill in the field, or later discovered), service models (e.g., Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS), and deployment models (e.g., private cloud, community cloud, public cloud, hybrid cloud, or any suitable service type model now known to those of ordinary skill in the field, or later discovered). Databases and servers described with respect to the present disclosure can be included in a cloud model.
- service models e.g., Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS)
- deployment models e.g., private cloud, community cloud, public cloud, hybrid cloud, or any
- ASICs application specific integrated circuits
- FIG. 1 is a block diagram illustrating an example computing device 100 .
- Computing device 100 may be used to perform various procedures, such as those discussed herein.
- Computing device 100 can function as a server, a client, or any other computing entity.
- Computing device can perform various monitoring functions as discussed herein, and can execute one or more application programs, such as the application programs described herein.
- Computing device 100 can be any of a wide variety of computing devices, such as a desktop computer, a notebook computer, a server computer, a handheld computer, tablet computer and the like.
- Computing device 100 includes one or more processor(s) 102 , one or more memory device(s) 104 , one or more interface(s) 106 , one or more mass storage device(s) 108 , one or more Input/Output (I/O) device(s) 110 , and a display device 130 all of which are coupled to a bus 112 .
- Processor(s) 102 include one or more processors or controllers that execute instructions stored in memory device(s) 104 and/or mass storage device(s) 108 .
- Processor(s) 102 may also include various types of computer-readable media, such as cache memory.
- Memory device(s) 104 include various computer-readable media, such as volatile memory (e.g., random access memory (RAM) 114 ) and/or nonvolatile memory (e.g., read-only memory (ROM) 116 ). Memory device(s) 104 may also include rewritable ROM, such as Flash memory.
- volatile memory e.g., random access memory (RAM) 114
- nonvolatile memory e.g., read-only memory (ROM) 116
- Memory device(s) 104 may also include rewritable ROM, such as Flash memory.
- Mass storage device(s) 108 include various computer readable media, such as magnetic tapes, magnetic disks, optical disks, solid-state memory (e.g., Flash memory), and so forth. As shown in FIG. 1 , a particular mass storage device is a hard disk drive 124 . Various drives may also be included in mass storage device(s) 108 to enable reading from and/or writing to the various computer readable media. Mass storage device(s) 108 include removable media 126 and/or non-removable media.
- I/O device(s) 110 include various devices that allow data and/or other information to be input to or retrieved from computing device 100 .
- Example I/O device(s) 110 include cursor control devices, keyboards, keypads, microphones, monitors or other display devices, speakers, printers, network interface cards, modems, lenses, CCDs or other image capture devices, and the like.
- Display device 130 includes any type of device capable of displaying information to one or more users of computing device 100 .
- Examples of display device 130 include a monitor, display terminal, video projection device, and the like.
- Interface(s) 106 include various interfaces that allow computing device 100 to interact with other systems, devices, or computing environments.
- Example interface(s) 106 may include any number of different network interfaces 120 , such as interfaces to local area networks (LANs), wide area networks (WANs), wireless networks, and the Internet.
- Other interface(s) include user interface 118 and peripheral device interface 122 .
- the interface(s) 106 may also include one or more user interface elements 118 .
- the interface(s) 106 may also include one or more peripheral interfaces such as interfaces for printers, pointing devices (mice, track pad, etc.), keyboards, and the like.
- Bus 112 allows processor(s) 102 , memory device(s) 104 , interface(s) 106 , mass storage device(s) 108 , and I/O device(s) 110 to communicate with one another, as well as other devices or components coupled to bus 112 .
- Bus 112 represents one or more of several types of bus structures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, and so forth.
- programs and other executable program components are shown herein as discrete blocks, although it is understood that such programs and components may reside at various times in different storage components of computing device 100 , and are executed by processor(s) 102 .
- the systems and procedures described herein can be implemented in hardware, or a combination of hardware, software, and/or firmware.
- one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein.
- FIG. 2 illustrates an example of a computing environment 200 and a smart crowd source environment 201 suitable for implementing the methods disclosed herein.
- a server 202 a provides access to a database 204 a in data communication therewith, and may be located and accessed within a brick and mortar retail location.
- the database 204 a may store customer attribute information such as a user profile as well as a list of other user profiles of friends and associates associated with the user profile.
- the database 204 a may additionally store attributes of the user associated with the user profile.
- the server 202 a may provide access to the database 204 a to users associated with the user profiles and/or to others.
- the server 202 a may implement a web server for receiving requests for data stored in the database 204 a and formatting requested information into web pages.
- the web server may additionally be operable to receive information and store the information in the database 204 a.
- a smart crowd source environment is a group of users connected over a network that are assigned tasks to perform over the network.
- the smart crowd source may be in the employ of a merchant, or may be under contract with on a per task basis.
- the work product of the smart crowd source is generally conveyed over the same network that supplied the tasks to be performed.
- users or members of a smart crowd source may be tasked with reviewing the classification of new product items and the hierarchy of products within a merchant's database.
- a server 202 b may be associated with a classification manager or other entity or party providing classification work.
- the server 202 b may be in data communication with a database 204 b .
- the database 204 b may store information regarding various products.
- information for a product may include a name, description, categorization, reviews, comments, price, past transaction data, and the like.
- the server 202 b may analyze this data as well as data retrieved from the database 204 a in order to perform methods as described herein.
- An operator or customer/user may access the server 202 b by means of a workstation 206 , which may be embodied as any general purpose computer, tablet computer, smart phone, or the like.
- the server 202 a and server 202 b may communicate with one another over a network 208 such as the Internet or some other local area network (LAN), wide area network (WAN), virtual private network (VPN), or other network.
- a user may access data and functionality provided by the servers 202 a , 202 b by means of a workstation 210 in data communication with the network 208 .
- the workstation 210 may be embodied as a general purpose computer, tablet computer, smart phone or the like.
- the workstation 210 may host a web browser for requesting web pages, displaying web pages, and receiving user interaction with web pages, and performing other functionality of a web browser.
- the workstation 210 , workstation 206 , servers 202 a - 202 b , and databases 204 a , 204 b may have some or all of the attributes of the computing device 100 .
- a classification model pipeline is intended to mean plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database.
- the plurality of classification models may be run in a predetermined order or may be run concurrently.
- the classification model pipeline may require that new product items be processed by all of the classification models within the pipeline, or may allow the classification process to stop before all of the classification models are run if predetermined thresholds are not met.
- computer system shall be construed broadly to include a network as defined herein, as well as a single-unit work station (such as work station 206 or other work station) whether connected directly to a network via a communications connection or disconnected from a network, as well as a group of single-unit work stations which can share data or information through non-network means such as a flash drive or any suitable non-network means for sharing data now known or later discovered.
- a single-unit work station such as work station 206 or other work station
- non-network means such as a flash drive or any suitable non-network means for sharing data now known or later discovered.
- FIG. 1 and FIG. 2 may be referenced secondarily during the discussion in order to provide hardware support for the implementation.
- the disclosure aims to disclose methods and systems to allow a new product item to be automatically and efficiently added to a product database.
- a product item may have a description and title associated with it that contains terms and values that can be quantified by at least one classification model such that the new product item can be categorized within a merchant's database.
- the title and description may be combined to supply quantifiable information that may be used to analyze and classify a product item so that it can properly be categorized within a database automatically, or alternatively with limited human involvement.
- the method 300 may be performed on a system that may include the database storage 204 a (or any suitable memory device disposed in communication with the network 208 ) receiving a new product item information 302 representing the new product item to be sold by a merchant.
- the product item information may be stored in memory located within computing environment 200 for later classification by the classification models within a pipeline.
- the product item information may be received into the computing environment in digital form from an electronic database in communication with the merchant's system.
- the new product item information may be manually input by a user connected electronically with the computing environment 200 .
- the new product item information may comprise a title, a description, parameters of use and performance, and any other suitable information associated with the product that may be of interest in a merchant environment for identifying, quantifying and categorizing the new product item.
- the system may build a first classification model within the classification model pipeline 305 for the new product item based on the product item information received at 302 .
- the classification model pipeline is shown as the dashed boundary line labeled 305 , and illustrates the plurality of classification models (at 304 a , 304 b , 303 c ) that makeup the classification model pipeline for the illustrated implementation.
- a classification model may be used within the computing environment 200 to quantify properties of the new product item by performing an algorithm or series of algorithms against the text properties (titles, description terms, images) provided in the new product item information received at 302 in order to quantify and ultimately classify the new product item relative to existing products items already in a merchant's database.
- classification models are: Na ⁇ ve Bayes, K-Nearest-Neighbors, SVM, logistic regression, and multiclass perceptron, or the like. It should be understood that any classification model that is known or yet to be discovered is to be considered within the scope of this disclosure. It is to be contemplated that the first classification model may comprise a single algorithm or a plurality of algorithms as desired to classify the new product item. At 303 b , the results of the classification model may be stored in memory within computing environment 200 .
- a classification model pipeline 305 is intended to comprise a plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database.
- the plurality of classification models may be run in a predetermined order, as illustrated in the figure, such that the result of the first classification model 304 a is processed by the successive classification models 304 b , 304 c to produce more accurate and refined classification results as the new product information is processed through each classification model in the classification model pipeline 305 .
- the classification model pipeline 305 may require that new product items be processed by all of the classification models within the pipeline, or may allow the classification process to stop before all of the classification models are run if predetermined thresholds are met.
- a threshold may be a minimum accuracy requirement, key word requirement, or field values requirement for fields needed within a merchant's database.
- a single threshold may be set for the entire classification model pipeline 305 such that the results of each classification model is checked against the same threshold.
- each classification model may have a corresponding threshold that corresponds to the capability of the classification model being used at each step in the pipeline.
- the threshold for the implementation illustrated in FIG. 3 is the same throughout the pipeline 305 such that the thresholds at 306 a , 306 b , 306 c are equivalent.
- the results of the classification model of 304 a are compared against a predetermined pipeline threshold. If the threshold is met at 306 a a classification for the new product item can be created at 308 from the results of the classification model built at 304 a . Alternatively, if the threshold is not met at 306 a the results of the first classification model can be processed and refined by a successive classification model built at 304 b.
- the results of the classification model of 304 b are compared against a predetermined pipeline threshold. If the threshold is met at 306 b a classification for the new product item may be created at 308 from the results of the classification model built at 304 b . Alternatively, if the threshold is not met at 306 b the results of the successive classification model built at 304 b can be processed and refined by yet another successive classification model built at 304 c.
- the results of the classification model of 304 c are compared against a predetermined pipeline threshold. If the threshold is met at 306 c a classification for the new product item can be created at 308 from the results of the classification model built at 304 c . Alternatively, if the threshold is not met at 306 c the results of the successive classification model built at 304 c can be processed and refined by yet another successive classification model, or may be presented for smart crowd source review at 312 because it is deemed too difficult for machine (classification model) classification.
- first and successive classification models may be different, while in another implementation the first and successive classification models may be the same.
- the results of the first classification model and successive classification models may be combined to create a refined product classification for the new product item.
- the results of successive classification models may be used complementary to the results of other classification models in an additive manner in order to emphasize or deemphasize certain aspects of the product information.
- the results of the first and successive classification models may be used in subtractive manner to emphasize or deemphasize certain aspects of the product information for the new product item classification.
- the new product item classification may be presented to a plurality of users for smart crowd source review.
- the smart crowd source review may be used to check the new product classification created at 308 for accuracy and relevancy.
- a new product item may be car tires for a scale model of a popular automobile that a merchant also provides tires for. If by chance that the classification models missed text values in the new product item information that denoted the tires were for a scale model, the scale model tires may appear in the merchants data base as full size tires for an actual automobile.
- a smart crowd user could readily spot such an anomaly and provide corrective information.
- any classification created entirely by the classification models with in the pipeline 305 may be present to a plurality of users for smart crowd source review as discussed previously.
- the smart crowd corrections are received by the system and may be added to the product classification and stored within the memory of the computing environment 200 .
- the smart crowd users may be connected over a network, or may be located within a brick and mortar building owned by the merchant.
- the smart crowd users maybe employees and representatives of the merchant, or may be outsourced to smart crowd communities.
- the new product item may be added to the merchant database and properly classified relative to existing products within the merchant database.
- a merchant can efficiently and cost effectively add new product items to their inventory by practicing the method 300 which takes advantage of a pipeline of classification models to accurately classify the product item.
- FIG. 1 and FIG. 2 may be referenced secondarily during the discussion in order to provide hardware support for the implementation.
- the disclosure aims to disclose methods and systems to allow a product to be automatically and efficiently added to a product database.
- a product item may have a description and title associated with it that contains terms and values that can be quantified by at least one classification model such that the new product item can be categorized within a merchant's database.
- the title and description may be combined to supply quantifiable information that may be used to analyze and classify a product item so that it can properly be categorized within a database automatically or with limited human involvement.
- the method 400 may be performed on a system that may include the database storage 204 a (or any suitable memory device disposed in communication with the network 208 ) receiving a new product item information 402 representing the new product item to be sold by a merchant.
- the product item information may be stored in memory located within computing environment 200 for later classification by the classification models within a pipeline.
- the product item information may be received into the computing environment in digital form from an electronic database in communication with the merchant's system, or may be manually input by a user connected electronically within the computing environment.
- the new product item information may comprise a title, a description, parameters of use and performance, and any other suitable information associated with the product that may be of interest in a merchant environment for identifying, quantifying and categorizing the new product item.
- the system may build a plurality of classification models within the classification model pipeline 405 for the new product item based on the product item information received at 402 .
- the classification model pipeline is shown as the dashed boundary line labeled 405 , and illustrates the plurality of classification models that makeup the classification model pipeline for the illustrated implementation.
- a classification model may be used within the computing environment 200 to quantify properties of the new product item by performing an algorithm or series of algorithm against the properties (titles, description terms, images) provided in the new product item information received at 402 in order to quantify and ultimately classify the new product item relative to existing products already in a merchant's database.
- classification models are: Na ⁇ ve Bayes, K-Nearest-Neighbors, SVM, logistic regression, and multiclass perceptron, and like models. It should be understood that any classification model that is known or yet to be discovered is to be considered within the scope of this disclosure. It is to be contemplated that the first classification model may comprise a single algorithm or a plurality of algorithms as desired to classify the new product item.
- a classification model pipeline 405 is intended to mean plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database.
- the plurality of classification models may be run in a predetermined order as illustrated in the figure such that the new product item information is processed by the first classification model 404 a and successive classification models 404 b , 404 c to produce a plurality of classifications that can be combined to form an accurate classification results as the new product information is processed by each classification model pipeline 405 .
- a threshold may be a minimum accuracy requirement, key word requirement, or field values requirement for fields needed within a merchant's database.
- a single threshold may be set for the entire classification model pipeline 405 in an implementation such that results of each classification model is checked against the same threshold.
- each classification model may have a corresponding threshold that corresponds to the capability of the classification model being used.
- the thresholds for the implementation illustrated in FIG. 4 are different for each of the classification models throughout the pipeline 405 .
- the results of the classification model of 404 a are compared against a predetermined threshold that specifically corresponds to the classification model built 404 a . If the threshold is met at 406 a a classification for the new product item can be created at 408 a from the results of the classification model built at 404 a .
- the threshold is not met at 406 a the results of the first classification model can be presented to a smart crowd source review at 416 .
- the results of the classification model of 404 b are compared against a predetermined threshold that specifically corresponds to the classification model built 404 b . If the threshold is met at 406 b a classification for the new product item can be created at 408 b from the results of the classification model built at 404 b . Alternatively, if the threshold is not met at 406 b the results of the first classification model can be presented to a smart crowd source review at 416 .
- the results of the classification model of 404 c are compared against a predetermined threshold that specifically corresponds to the classification model built 404 c . If the threshold is met at 406 c , a classification for the new product item can be created at 408 c from the results of the classification model built at 404 c . Alternatively, if the threshold is not met at 406 c the results of the first classification model can be presented to a smart crowd source review at 416 .
- the results of the first classification model and successive classification models may be combined to create a refined product classification for the new product item.
- the results of successive classification models may be used complementary to the results of other classification models in an additive manner in order to emphasize or deemphasize certain aspects of the product information.
- the results of the first and successive classification models may be used in subtractive manner to emphasize or deemphasize certain aspects of the product information for the new product item classification.
- the new product item classification may be presented to a plurality of users for smart crowd source review.
- the smart crowd source review may be used to check the new product classification created at 410 for accuracy and relevancy.
- any classification created entirely by the classification models with in the pipeline 405 may be present to a plurality of users for smart crowd source review as discussed previously.
- the smart crowd corrections are received by the system and may be added to the product classification and stored within memory of the computing environment 200 .
- the smart crowd users may be connected over a network, or may be located within a brick and mortar building owned by the merchant.
- the smart crowd users maybe employees and/or representatives of the merchant, or may be outsourced to smart crowd communities.
- the new product item may be added to the merchant database and properly classified relative to existing products within the merchant database.
- a merchant can efficiently and cost effectively add new product items to their inventory by practicing the method 400 which takes advantage of a pipeline of classification models to accurately classify the product item.
- FIG. 1 and FIG. 2 may be referenced secondarily during the discussion in order to provide hardware support for the implementation.
- the disclosure aims to disclose methods and systems to allow a product to be automatically and efficiently added to a product database by quantifying information corresponding to the new item with a plurality of classification models in a classification model pipeline.
- a product item may have a description and title associated with it that contains terms and values that can be quantified by at least one classification model such that the new product item can be categorized within a merchant's database.
- the title and description may be combined to supply quantifiable information that may be used to analyze and classify a product item so that it can properly be categorized within a database automatically or with limited human involvement.
- the method 500 may be performed on a system that may include the database storage 204 a (or any suitable memory device disposed in communication with the network 208 ) receiving a new product item information 502 representing the new product item to be sold by a merchant.
- the product item information may be stored in memory located within computing environment 200 for later classification by the classification models within a pipeline.
- the product item information may be received into the computing environment in digital form from an electronic database in communication with the merchant's system, or may be manually input by a user connected electronically within the computing environment.
- the new product item information may comprise a title, a description, parameters of use and performance, and any other suitable information associated with the product that may be of interest in a merchant environment for identifying, quantifying and categorizing the new product item.
- the system may build a first classification model within the classification model pipeline 505 for the new product item based on the product item information received at 502 .
- the classification model pipeline is shown as the dashed boundary line labeled 505 , and illustrates the coordination of a plurality of classification models ( 504 a , 504 b , 503 c ) that makeup the classification model pipeline for the illustrated implementation.
- a classification model may be used within the computing environment 200 to quantify properties of the new product item by performing an algorithm or series of algorithm against the properties (titles, description terms, images) provided in the new product item information received at 502 in order to quantify and ultimately classify the new product item relative to existing products items already in a merchant's database.
- classification models are: Na ⁇ ve Bayes, K-Nearest-Neighbors, SVM, logistic regression, and multiclass perceptron, or other like classification models. It should be understood that any classification model that is known or yet to be discovered is to be considered within the scope of this disclosure. It is to be contemplated that the first classification model may comprise a single algorithm or a plurality of algorithms as desired to classify the new product item. At 503 b , the classification model may be stored in memory within computing environment 200 .
- a classification model pipeline 505 is intended to comprise plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database.
- the plurality of classification models may be run in a predetermined order as illustrated in the figure such that the result of the first classification model 504 a is processed by the successive classification models 504 b , 504 c to produce more accurate and refined classification results as the new product information is processed through the entire classification model pipeline 305 .
- the classification model pipeline 505 may require that new product items be processed by all of the classification models within the pipeline, or may allow the classification process to stop the classification models in the pipeline and rely upon a smart crowd source to create the classification if predetermined thresholds are met.
- a threshold may be a minimum accuracy requirement, key word requirement, or field values requirement for fields needed within a merchant's database.
- each classification model may have a corresponding threshold that corresponds to the capability of the classification model being used.
- the threshold for the implementation illustrated in FIG. 5 is for each classification model built within the pipeline 505 .
- N is used to denote the number of successive classification models within the pipeline
- n is used to denote the corresponding threshold to be used.
- a classification for the new product item can be created at 508 from the results of the classification model(N) built at 504 a .
- the results of the first classification model(N) can be processed and refined by a successive classification model(N+1) built at 504 b.
- the results of the classification model(N+1) of 504 b are compared against a corresponding threshold(n+1). If the threshold(n+1) is met at 506 b a classification for the new product item can be created at 508 from the results of the classification model(N+1) built at 504 b . Alternatively, if the threshold(n+1) is not met at 506 b the results of the successive classification model(n+1) built at 504 b can be processed and refined by yet another successive classification model(N+2) built at 504 c.
- the results of the classification model(N+2) of 504 c are compared against a predetermined corresponding threshold(n+2). If the threshold(n+2) is met at 506 c a classification for the new product item can be created at 508 from the results of the classification model(N+2) built at 504 c . Alternatively, if the threshold(n+2) is not met at 506 c the results of the successive classification model(N+2) built at 504 c can be processed and refined by yet another successive classification model(N+J) where J represents any number of iterations. Alternatively, the classification results may be presented for smart crowd source review and classification at 512 because it is deemed too difficult for machine classification. In a classification model pipeline implementation the first and successive classification models may be different, while in another implementation the first and successive classification models may be the same.
- the results of the first classification model and successive classification models may be combined to create a refined product classification for the new product item.
- the results of successive classification models may be used complementary to the results of other classification models in an additive manner in order to emphasize or deemphasize certain aspects of the product information.
- the results of the first and successive classification models may be used in subtractive manner to emphasize or deemphasize certain aspects of the product information for the new product item classification.
- the new product item classification may be presented to a plurality of users for smart crowd source review.
- the smart crowd source review may be used to check the new product classification created by the classification models for accuracy and relevancy.
- the new product item may be added to the merchant database and properly classified relative to existing products within the merchant database.
- a merchant can efficiently and cost effectively add new product items to their inventory by practicing the method 500 which takes advantage of a pipeline of classification models to accurately classify the product item.
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- Retailers often have databases and warehouses full of thousands upon thousands of products offered for sale, with new products being offered every day. The databases must be updated with these new products in an organized and usable manner. Each product and new product should be categorized within the database so that it can be found by customers for purchase or employees for stocking. The large number of products offered for sale by a merchant makes updating a merchant's product database difficult and costly with current methods and systems.
- These problems apply even with the use of computers and current computing systems. The disclosed methods and systems herein, provide more efficient and cost effective methods and systems for merchants to keep product databases up to date with new product offerings.
- Non-limiting and non-exhaustive implementations of the present disclosure are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified. Advantages of the present disclosure will become better understood with regard to the following description and accompanying drawings where:
-
FIG. 1 illustrates an example block diagram of a computing device; -
FIG. 2 illustrates an example computer architecture that facilitates different implementations described herein; -
FIG. 3 illustrates a flow chart of an example method according to one implementation; -
FIG. 4 illustrates a flow chart of an example method according to one implementation; and -
FIG. 5 illustrates a flow chart of an example method according to one implementation. - The present disclosure extends to methods, systems, and computer program products for providing merchant database updates for new product items. In the following description of the present disclosure, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific implementations in which the disclosure may be practiced. It is understood that other implementations may be utilized and structural changes may be made without departing from the scope of the present disclosure.
- Implementations of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
- Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
- A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
- Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures that can be transferred automatically from transmission media to computer storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. RAM can also include solid state drives (SSDs or PCIx based real time memory tiered Storage, such as FusionIO). Thus, it should be understood that computer storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
- Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
- Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like. It should be noted that any of the above mentioned computing devices may be provided by or located within a brick and mortar location. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
- Implementations of the disclosure can also be used in cloud computing environments. In this description and the following claims, “cloud computing” is defined as a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned via virtualization and released with minimal management effort or service provider interaction, and then scaled accordingly. A cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, or any suitable characteristic now known to those of ordinary skill in the field, or later discovered), service models (e.g., Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS), and deployment models (e.g., private cloud, community cloud, public cloud, hybrid cloud, or any suitable service type model now known to those of ordinary skill in the field, or later discovered). Databases and servers described with respect to the present disclosure can be included in a cloud model.
- Further, where appropriate, functions described herein can be performed in one or more of: hardware, software, firmware, digital components, or analog components. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein. Certain terms are used throughout the following description and Claims to refer to particular system components. As one skilled in the art will appreciate, components may be referred to by different names. This document does not intend to distinguish between components that differ in name, but not function.
-
FIG. 1 is a block diagram illustrating anexample computing device 100.Computing device 100 may be used to perform various procedures, such as those discussed herein.Computing device 100 can function as a server, a client, or any other computing entity. Computing device can perform various monitoring functions as discussed herein, and can execute one or more application programs, such as the application programs described herein.Computing device 100 can be any of a wide variety of computing devices, such as a desktop computer, a notebook computer, a server computer, a handheld computer, tablet computer and the like. -
Computing device 100 includes one or more processor(s) 102, one or more memory device(s) 104, one or more interface(s) 106, one or more mass storage device(s) 108, one or more Input/Output (I/O) device(s) 110, and adisplay device 130 all of which are coupled to abus 112. Processor(s) 102 include one or more processors or controllers that execute instructions stored in memory device(s) 104 and/or mass storage device(s) 108. Processor(s) 102 may also include various types of computer-readable media, such as cache memory. - Memory device(s) 104 include various computer-readable media, such as volatile memory (e.g., random access memory (RAM) 114) and/or nonvolatile memory (e.g., read-only memory (ROM) 116). Memory device(s) 104 may also include rewritable ROM, such as Flash memory.
- Mass storage device(s) 108 include various computer readable media, such as magnetic tapes, magnetic disks, optical disks, solid-state memory (e.g., Flash memory), and so forth. As shown in
FIG. 1 , a particular mass storage device is a hard disk drive 124. Various drives may also be included in mass storage device(s) 108 to enable reading from and/or writing to the various computer readable media. Mass storage device(s) 108 includeremovable media 126 and/or non-removable media. - I/O device(s) 110 include various devices that allow data and/or other information to be input to or retrieved from
computing device 100. Example I/O device(s) 110 include cursor control devices, keyboards, keypads, microphones, monitors or other display devices, speakers, printers, network interface cards, modems, lenses, CCDs or other image capture devices, and the like. -
Display device 130 includes any type of device capable of displaying information to one or more users ofcomputing device 100. Examples ofdisplay device 130 include a monitor, display terminal, video projection device, and the like. - Interface(s) 106 include various interfaces that allow
computing device 100 to interact with other systems, devices, or computing environments. Example interface(s) 106 may include any number of different network interfaces 120, such as interfaces to local area networks (LANs), wide area networks (WANs), wireless networks, and the Internet. Other interface(s) include user interface 118 andperipheral device interface 122. The interface(s) 106 may also include one or more user interface elements 118. The interface(s) 106 may also include one or more peripheral interfaces such as interfaces for printers, pointing devices (mice, track pad, etc.), keyboards, and the like. -
Bus 112 allows processor(s) 102, memory device(s) 104, interface(s) 106, mass storage device(s) 108, and I/O device(s) 110 to communicate with one another, as well as other devices or components coupled tobus 112.Bus 112 represents one or more of several types of bus structures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, and so forth. - For purposes of illustration, programs and other executable program components are shown herein as discrete blocks, although it is understood that such programs and components may reside at various times in different storage components of
computing device 100, and are executed by processor(s) 102. Alternatively, the systems and procedures described herein can be implemented in hardware, or a combination of hardware, software, and/or firmware. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein. -
FIG. 2 illustrates an example of acomputing environment 200 and a smart crowd source environment 201 suitable for implementing the methods disclosed herein. In some implementations, aserver 202 a provides access to adatabase 204 a in data communication therewith, and may be located and accessed within a brick and mortar retail location. Thedatabase 204 a may store customer attribute information such as a user profile as well as a list of other user profiles of friends and associates associated with the user profile. Thedatabase 204 a may additionally store attributes of the user associated with the user profile. Theserver 202 a may provide access to thedatabase 204 a to users associated with the user profiles and/or to others. For example, theserver 202 a may implement a web server for receiving requests for data stored in thedatabase 204 a and formatting requested information into web pages. The web server may additionally be operable to receive information and store the information in thedatabase 204 a. - As used herein, a smart crowd source environment is a group of users connected over a network that are assigned tasks to perform over the network. In an implementation the smart crowd source may be in the employ of a merchant, or may be under contract with on a per task basis. The work product of the smart crowd source is generally conveyed over the same network that supplied the tasks to be performed. In the implementations that follow, users or members of a smart crowd source may be tasked with reviewing the classification of new product items and the hierarchy of products within a merchant's database.
- A
server 202 b may be associated with a classification manager or other entity or party providing classification work. Theserver 202 b may be in data communication with adatabase 204 b. Thedatabase 204 b may store information regarding various products. In particular, information for a product may include a name, description, categorization, reviews, comments, price, past transaction data, and the like. Theserver 202 b may analyze this data as well as data retrieved from thedatabase 204 a in order to perform methods as described herein. An operator or customer/user may access theserver 202 b by means of aworkstation 206, which may be embodied as any general purpose computer, tablet computer, smart phone, or the like. - The
server 202 a andserver 202 b may communicate with one another over anetwork 208 such as the Internet or some other local area network (LAN), wide area network (WAN), virtual private network (VPN), or other network. A user may access data and functionality provided by theservers workstation 210 in data communication with thenetwork 208. Theworkstation 210 may be embodied as a general purpose computer, tablet computer, smart phone or the like. For example, theworkstation 210 may host a web browser for requesting web pages, displaying web pages, and receiving user interaction with web pages, and performing other functionality of a web browser. Theworkstation 210,workstation 206, servers 202 a-202 b, anddatabases computing device 100. - As used herein, a classification model pipeline is intended to mean plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database. The plurality of classification models may be run in a predetermined order or may be run concurrently. The classification model pipeline may require that new product items be processed by all of the classification models within the pipeline, or may allow the classification process to stop before all of the classification models are run if predetermined thresholds are not met.
- It is to be further understood that the phrase “computer system,” as used herein, shall be construed broadly to include a network as defined herein, as well as a single-unit work station (such as
work station 206 or other work station) whether connected directly to a network via a communications connection or disconnected from a network, as well as a group of single-unit work stations which can share data or information through non-network means such as a flash drive or any suitable non-network means for sharing data now known or later discovered. - With reference primarily to
FIG. 3 , an implementation of amethod 300 for updating a merchant's database through semantic product classification will be discussed.FIG. 1 andFIG. 2 may be referenced secondarily during the discussion in order to provide hardware support for the implementation. The disclosure aims to disclose methods and systems to allow a new product item to be automatically and efficiently added to a product database. For example, a product item may have a description and title associated with it that contains terms and values that can be quantified by at least one classification model such that the new product item can be categorized within a merchant's database. In an implementation the title and description may be combined to supply quantifiable information that may be used to analyze and classify a product item so that it can properly be categorized within a database automatically, or alternatively with limited human involvement. - The
method 300 may be performed on a system that may include thedatabase storage 204 a (or any suitable memory device disposed in communication with the network 208) receiving a newproduct item information 302 representing the new product item to be sold by a merchant. The product item information may be stored in memory located withincomputing environment 200 for later classification by the classification models within a pipeline. The product item information may be received into the computing environment in digital form from an electronic database in communication with the merchant's system. Additionally, the new product item information may be manually input by a user connected electronically with thecomputing environment 200. The new product item information may comprise a title, a description, parameters of use and performance, and any other suitable information associated with the product that may be of interest in a merchant environment for identifying, quantifying and categorizing the new product item. - At 304 a, the system may build a first classification model within the
classification model pipeline 305 for the new product item based on the product item information received at 302. The classification model pipeline is shown as the dashed boundary line labeled 305, and illustrates the plurality of classification models (at 304 a, 304 b, 303 c) that makeup the classification model pipeline for the illustrated implementation. A classification model may be used within thecomputing environment 200 to quantify properties of the new product item by performing an algorithm or series of algorithms against the text properties (titles, description terms, images) provided in the new product item information received at 302 in order to quantify and ultimately classify the new product item relative to existing products items already in a merchant's database. Examples of classification models are: Naïve Bayes, K-Nearest-Neighbors, SVM, logistic regression, and multiclass perceptron, or the like. It should be understood that any classification model that is known or yet to be discovered is to be considered within the scope of this disclosure. It is to be contemplated that the first classification model may comprise a single algorithm or a plurality of algorithms as desired to classify the new product item. At 303 b, the results of the classification model may be stored in memory withincomputing environment 200. - A
classification model pipeline 305 is intended to comprise a plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database. The plurality of classification models may be run in a predetermined order, as illustrated in the figure, such that the result of thefirst classification model 304 a is processed by thesuccessive classification models classification model pipeline 305. Theclassification model pipeline 305 may require that new product items be processed by all of the classification models within the pipeline, or may allow the classification process to stop before all of the classification models are run if predetermined thresholds are met. - At 306 a, 306 b and 306 c, the classification model results of
classification models - It should be noted, that a single threshold may be set for the entire
classification model pipeline 305 such that the results of each classification model is checked against the same threshold. Alternatively, in an implementation each classification model may have a corresponding threshold that corresponds to the capability of the classification model being used at each step in the pipeline. For discussion purposes, the threshold for the implementation illustrated inFIG. 3 is the same throughout thepipeline 305 such that the thresholds at 306 a, 306 b, 306 c are equivalent. For example, at 306 a the results of the classification model of 304 a are compared against a predetermined pipeline threshold. If the threshold is met at 306 a a classification for the new product item can be created at 308 from the results of the classification model built at 304 a. Alternatively, if the threshold is not met at 306 a the results of the first classification model can be processed and refined by a successive classification model built at 304 b. - Continuing on, at 306 b the results of the classification model of 304 b are compared against a predetermined pipeline threshold. If the threshold is met at 306 b a classification for the new product item may be created at 308 from the results of the classification model built at 304 b. Alternatively, if the threshold is not met at 306 b the results of the successive classification model built at 304 b can be processed and refined by yet another successive classification model built at 304 c.
- For completeness in discussing
FIG. 3 , at 306 c the results of the classification model of 304 c are compared against a predetermined pipeline threshold. If the threshold is met at 306 c a classification for the new product item can be created at 308 from the results of the classification model built at 304 c. Alternatively, if the threshold is not met at 306 c the results of the successive classification model built at 304 c can be processed and refined by yet another successive classification model, or may be presented for smart crowd source review at 312 because it is deemed too difficult for machine (classification model) classification. - It should be noted that in a classification model pipeline implementation, the first and successive classification models may be different, while in another implementation the first and successive classification models may be the same.
- At 308, the results of the first classification model and successive classification models may be combined to create a refined product classification for the new product item. In an implementation the results of successive classification models may be used complementary to the results of other classification models in an additive manner in order to emphasize or deemphasize certain aspects of the product information. Alternatively, the results of the first and successive classification models may be used in subtractive manner to emphasize or deemphasize certain aspects of the product information for the new product item classification.
- At 312, the new product item classification may be presented to a plurality of users for smart crowd source review. The smart crowd source review may be used to check the new product classification created at 308 for accuracy and relevancy. For example, a new product item may be car tires for a scale model of a popular automobile that a merchant also provides tires for. If by chance that the classification models missed text values in the new product item information that denoted the tires were for a scale model, the scale model tires may appear in the merchants data base as full size tires for an actual automobile. A smart crowd user could readily spot such an anomaly and provide corrective information.
- At 316, any classification created entirely by the classification models with in the
pipeline 305 may be present to a plurality of users for smart crowd source review as discussed previously. - At 318, the smart crowd corrections are received by the system and may be added to the product classification and stored within the memory of the
computing environment 200. It should be noted that the smart crowd users may be connected over a network, or may be located within a brick and mortar building owned by the merchant. The smart crowd users maybe employees and representatives of the merchant, or may be outsourced to smart crowd communities. - At 320, the new product item may be added to the merchant database and properly classified relative to existing products within the merchant database. As can be realized from the discussion above, a merchant can efficiently and cost effectively add new product items to their inventory by practicing the
method 300 which takes advantage of a pipeline of classification models to accurately classify the product item. - With reference primarily to
FIG. 4 , an implementation of amethod 400 for updating a merchant's database through semantic product classification will be discussed.FIG. 1 andFIG. 2 may be referenced secondarily during the discussion in order to provide hardware support for the implementation. The disclosure aims to disclose methods and systems to allow a product to be automatically and efficiently added to a product database. For example, a product item may have a description and title associated with it that contains terms and values that can be quantified by at least one classification model such that the new product item can be categorized within a merchant's database. In an implementation the title and description may be combined to supply quantifiable information that may be used to analyze and classify a product item so that it can properly be categorized within a database automatically or with limited human involvement. - The
method 400 may be performed on a system that may include thedatabase storage 204 a (or any suitable memory device disposed in communication with the network 208) receiving a newproduct item information 402 representing the new product item to be sold by a merchant. The product item information may be stored in memory located withincomputing environment 200 for later classification by the classification models within a pipeline. The product item information may be received into the computing environment in digital form from an electronic database in communication with the merchant's system, or may be manually input by a user connected electronically within the computing environment. The new product item information may comprise a title, a description, parameters of use and performance, and any other suitable information associated with the product that may be of interest in a merchant environment for identifying, quantifying and categorizing the new product item. - At 404 a, 404 b, 404 c the system may build a plurality of classification models within the
classification model pipeline 405 for the new product item based on the product item information received at 402. The classification model pipeline is shown as the dashed boundary line labeled 405, and illustrates the plurality of classification models that makeup the classification model pipeline for the illustrated implementation. A classification model may be used within thecomputing environment 200 to quantify properties of the new product item by performing an algorithm or series of algorithm against the properties (titles, description terms, images) provided in the new product item information received at 402 in order to quantify and ultimately classify the new product item relative to existing products already in a merchant's database. Examples of classification models are: Naïve Bayes, K-Nearest-Neighbors, SVM, logistic regression, and multiclass perceptron, and like models. It should be understood that any classification model that is known or yet to be discovered is to be considered within the scope of this disclosure. It is to be contemplated that the first classification model may comprise a single algorithm or a plurality of algorithms as desired to classify the new product item. - A
classification model pipeline 405 is intended to mean plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database. The plurality of classification models may be run in a predetermined order as illustrated in the figure such that the new product item information is processed by the first classification model 404 a andsuccessive classification models classification model pipeline 405. - At 406 a, 406 b and 406 c, the classification model results of
classification models - It should be noted, that a single threshold may be set for the entire
classification model pipeline 405 in an implementation such that results of each classification model is checked against the same threshold. In an implementation each classification model may have a corresponding threshold that corresponds to the capability of the classification model being used. For discussion purposes, the thresholds for the implementation illustrated inFIG. 4 are different for each of the classification models throughout thepipeline 405. For example, at 406 a the results of the classification model of 404 a are compared against a predetermined threshold that specifically corresponds to the classification model built 404 a. If the threshold is met at 406 a a classification for the new product item can be created at 408 a from the results of the classification model built at 404 a. Alternatively, if the threshold is not met at 406 a the results of the first classification model can be presented to a smart crowd source review at 416. - Continuing on, at 406 b the results of the classification model of 404 b are compared against a predetermined threshold that specifically corresponds to the classification model built 404 b. If the threshold is met at 406 b a classification for the new product item can be created at 408 b from the results of the classification model built at 404 b. Alternatively, if the threshold is not met at 406 b the results of the first classification model can be presented to a smart crowd source review at 416.
- For completeness in discussing
FIG. 4 , at 406 c the results of the classification model of 404 c are compared against a predetermined threshold that specifically corresponds to the classification model built 404 c. If the threshold is met at 406 c, a classification for the new product item can be created at 408 c from the results of the classification model built at 404 c. Alternatively, if the threshold is not met at 406 c the results of the first classification model can be presented to a smart crowd source review at 416. - At 410, the results of the first classification model and successive classification models may be combined to create a refined product classification for the new product item. In an implementation the results of successive classification models may be used complementary to the results of other classification models in an additive manner in order to emphasize or deemphasize certain aspects of the product information. Alternatively, the results of the first and successive classification models may be used in subtractive manner to emphasize or deemphasize certain aspects of the product information for the new product item classification.
- At 412, the new product item classification may be presented to a plurality of users for smart crowd source review. The smart crowd source review may be used to check the new product classification created at 410 for accuracy and relevancy.
- At 416, any classification created entirely by the classification models with in the
pipeline 405 may be present to a plurality of users for smart crowd source review as discussed previously. - At 418, the smart crowd corrections are received by the system and may be added to the product classification and stored within memory of the
computing environment 200. It should be noted that the smart crowd users may be connected over a network, or may be located within a brick and mortar building owned by the merchant. The smart crowd users maybe employees and/or representatives of the merchant, or may be outsourced to smart crowd communities. - At 420, the new product item may be added to the merchant database and properly classified relative to existing products within the merchant database. As can be realized from the discussion above, a merchant can efficiently and cost effectively add new product items to their inventory by practicing the
method 400 which takes advantage of a pipeline of classification models to accurately classify the product item. - With reference primarily to
FIG. 5 , an implementation of amethod 500 for updating a merchant's database through semantic product classification will be discussed.FIG. 1 andFIG. 2 may be referenced secondarily during the discussion in order to provide hardware support for the implementation. The disclosure aims to disclose methods and systems to allow a product to be automatically and efficiently added to a product database by quantifying information corresponding to the new item with a plurality of classification models in a classification model pipeline. For example, a product item may have a description and title associated with it that contains terms and values that can be quantified by at least one classification model such that the new product item can be categorized within a merchant's database. In an implementation the title and description may be combined to supply quantifiable information that may be used to analyze and classify a product item so that it can properly be categorized within a database automatically or with limited human involvement. - The
method 500 may be performed on a system that may include thedatabase storage 204 a (or any suitable memory device disposed in communication with the network 208) receiving a newproduct item information 502 representing the new product item to be sold by a merchant. The product item information may be stored in memory located withincomputing environment 200 for later classification by the classification models within a pipeline. The product item information may be received into the computing environment in digital form from an electronic database in communication with the merchant's system, or may be manually input by a user connected electronically within the computing environment. The new product item information may comprise a title, a description, parameters of use and performance, and any other suitable information associated with the product that may be of interest in a merchant environment for identifying, quantifying and categorizing the new product item. - At 504 a, the system may build a first classification model within the
classification model pipeline 505 for the new product item based on the product item information received at 502. The classification model pipeline is shown as the dashed boundary line labeled 505, and illustrates the coordination of a plurality of classification models (504 a, 504 b, 503 c) that makeup the classification model pipeline for the illustrated implementation. A classification model may be used within thecomputing environment 200 to quantify properties of the new product item by performing an algorithm or series of algorithm against the properties (titles, description terms, images) provided in the new product item information received at 502 in order to quantify and ultimately classify the new product item relative to existing products items already in a merchant's database. Examples of classification models are: Naïve Bayes, K-Nearest-Neighbors, SVM, logistic regression, and multiclass perceptron, or other like classification models. It should be understood that any classification model that is known or yet to be discovered is to be considered within the scope of this disclosure. It is to be contemplated that the first classification model may comprise a single algorithm or a plurality of algorithms as desired to classify the new product item. At 503 b, the classification model may be stored in memory withincomputing environment 200. - A
classification model pipeline 505 is intended to comprise plurality of classification models organized to optimize the classification of new product items that are to be added to a merchant database. The plurality of classification models may be run in a predetermined order as illustrated in the figure such that the result of the first classification model 504 a is processed by the successive classification models 504 b, 504 c to produce more accurate and refined classification results as the new product information is processed through the entireclassification model pipeline 305. Theclassification model pipeline 505 may require that new product items be processed by all of the classification models within the pipeline, or may allow the classification process to stop the classification models in the pipeline and rely upon a smart crowd source to create the classification if predetermined thresholds are met. - At 506 a, 506 b and 506 c, the classification model results of classification models 504 a, 504 b and 504 c are checked against a predetermined threshold. In an implementation a threshold may be a minimum accuracy requirement, key word requirement, or field values requirement for fields needed within a merchant's database.
- In an implementation each classification model may have a corresponding threshold that corresponds to the capability of the classification model being used. For discussion purposes, the threshold for the implementation illustrated in
FIG. 5 is for each classification model built within thepipeline 505. Additionally, it should be noted that there is not a limit to the number of classification models that may be included in a classification pipeline. For example, at 506 a the results of the classification model(N) of 504 a are compared against a corresponding threshold(n). In the present implementation N is used to denote the number of successive classification models within the pipeline, and n is used to denote the corresponding threshold to be used. If the threshold(n) is met at 506 a a classification for the new product item can be created at 508 from the results of the classification model(N) built at 504 a. Alternatively, if the threshold(n) is not met at 506 a the results of the first classification model(N) can be processed and refined by a successive classification model(N+1) built at 504 b. - Continuing on, at 506 b the results of the classification model(N+1) of 504 b are compared against a corresponding threshold(n+1). If the threshold(n+1) is met at 506 b a classification for the new product item can be created at 508 from the results of the classification model(N+1) built at 504 b. Alternatively, if the threshold(n+1) is not met at 506 b the results of the successive classification model(n+1) built at 504 b can be processed and refined by yet another successive classification model(N+2) built at 504 c.
- For completeness in discussing
FIG. 5 , at 506 c the results of the classification model(N+2) of 504 c are compared against a predetermined corresponding threshold(n+2). If the threshold(n+2) is met at 506 c a classification for the new product item can be created at 508 from the results of the classification model(N+2) built at 504 c. Alternatively, if the threshold(n+2) is not met at 506 c the results of the successive classification model(N+2) built at 504 c can be processed and refined by yet another successive classification model(N+J) where J represents any number of iterations. Alternatively, the classification results may be presented for smart crowd source review and classification at 512 because it is deemed too difficult for machine classification. In a classification model pipeline implementation the first and successive classification models may be different, while in another implementation the first and successive classification models may be the same. - At 508, the results of the first classification model and successive classification models may be combined to create a refined product classification for the new product item. In an implementation the results of successive classification models may be used complementary to the results of other classification models in an additive manner in order to emphasize or deemphasize certain aspects of the product information. Alternatively, the results of the first and successive classification models may be used in subtractive manner to emphasize or deemphasize certain aspects of the product information for the new product item classification.
- At 512, the new product item classification may be presented to a plurality of users for smart crowd source review. The smart crowd source review may be used to check the new product classification created by the classification models for accuracy and relevancy.
- At 516, the new product item may be added to the merchant database and properly classified relative to existing products within the merchant database. As can be realized from the discussion above, a merchant can efficiently and cost effectively add new product items to their inventory by practicing the
method 500 which takes advantage of a pipeline of classification models to accurately classify the product item. - The foregoing description has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. Further, it should be noted that any or all of the aforementioned alternate implementations may be used in any combination desired to form additional hybrid implementations of the disclosure.
- Further, although specific implementations of the disclosure have been described and illustrated, the disclosure is not to be limited to the specific forms or arrangements of parts so described and illustrated. The scope of the disclosure is to be defined by the claims appended hereto, any future claims submitted here and in different applications, and their equivalents.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/756,450 US20140214844A1 (en) | 2013-01-31 | 2013-01-31 | Multiple classification models in a pipeline |
US14/847,944 US10915557B2 (en) | 2013-01-31 | 2015-09-08 | Product classification data transfer and management |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/756,450 US20140214844A1 (en) | 2013-01-31 | 2013-01-31 | Multiple classification models in a pipeline |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/756,467 Continuation-In-Part US20140214845A1 (en) | 2013-01-31 | 2013-01-31 | Product classification into product type families |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/756,443 Continuation-In-Part US20140214841A1 (en) | 2013-01-31 | 2013-01-31 | Semantic Product Classification |
US13/756,467 Continuation-In-Part US20140214845A1 (en) | 2013-01-31 | 2013-01-31 | Product classification into product type families |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140214844A1 true US20140214844A1 (en) | 2014-07-31 |
Family
ID=51224152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/756,450 Abandoned US20140214844A1 (en) | 2013-01-31 | 2013-01-31 | Multiple classification models in a pipeline |
Country Status (1)
Country | Link |
---|---|
US (1) | US20140214844A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160063097A1 (en) * | 2014-08-27 | 2016-03-03 | Next It Corporation | Data Clustering System, Methods, and Techniques |
US9832274B1 (en) | 2017-04-27 | 2017-11-28 | Bluecore, Inc. | Directory update monitoring systems and methods |
CN110489550A (en) * | 2019-07-16 | 2019-11-22 | 招联消费金融有限公司 | File classification method, device and computer equipment based on combination neural net |
US10915557B2 (en) | 2013-01-31 | 2021-02-09 | Walmart Apollo, Llc | Product classification data transfer and management |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080027887A1 (en) * | 2006-07-27 | 2008-01-31 | The Government Of The Us, As Represented By The Secretary Of The Navy | System and method for fusing data from different information sources |
US20080319932A1 (en) * | 2007-06-21 | 2008-12-25 | Microsoft Corporation | Classification using a cascade approach |
US20090012971A1 (en) * | 2007-01-26 | 2009-01-08 | Herbert Dennis Hunt | Similarity matching of products based on multiple classification schemes |
US20090157571A1 (en) * | 2007-12-12 | 2009-06-18 | International Business Machines Corporation | Method and apparatus for model-shared subspace boosting for multi-label classification |
US20130110498A1 (en) * | 2011-10-28 | 2013-05-02 | Linkedln Corporation | Phrase-based data classification system |
US20140006319A1 (en) * | 2012-06-29 | 2014-01-02 | International Business Machines Corporation | Extension to the expert conversation builder |
-
2013
- 2013-01-31 US US13/756,450 patent/US20140214844A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080027887A1 (en) * | 2006-07-27 | 2008-01-31 | The Government Of The Us, As Represented By The Secretary Of The Navy | System and method for fusing data from different information sources |
US20090012971A1 (en) * | 2007-01-26 | 2009-01-08 | Herbert Dennis Hunt | Similarity matching of products based on multiple classification schemes |
US20080319932A1 (en) * | 2007-06-21 | 2008-12-25 | Microsoft Corporation | Classification using a cascade approach |
US20090157571A1 (en) * | 2007-12-12 | 2009-06-18 | International Business Machines Corporation | Method and apparatus for model-shared subspace boosting for multi-label classification |
US20130110498A1 (en) * | 2011-10-28 | 2013-05-02 | Linkedln Corporation | Phrase-based data classification system |
US20140006319A1 (en) * | 2012-06-29 | 2014-01-02 | International Business Machines Corporation | Extension to the expert conversation builder |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10915557B2 (en) | 2013-01-31 | 2021-02-09 | Walmart Apollo, Llc | Product classification data transfer and management |
US20160063097A1 (en) * | 2014-08-27 | 2016-03-03 | Next It Corporation | Data Clustering System, Methods, and Techniques |
US10599953B2 (en) * | 2014-08-27 | 2020-03-24 | Verint Americas Inc. | Method and system for generating and correcting classification models |
US20200184276A1 (en) * | 2014-08-27 | 2020-06-11 | Verint Americas Inc. | Method and system for generating and correcting classification models |
US11537820B2 (en) * | 2014-08-27 | 2022-12-27 | Verint Americas Inc. | Method and system for generating and correcting classification models |
US9832274B1 (en) | 2017-04-27 | 2017-11-28 | Bluecore, Inc. | Directory update monitoring systems and methods |
CN110489550A (en) * | 2019-07-16 | 2019-11-22 | 招联消费金融有限公司 | File classification method, device and computer equipment based on combination neural net |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20140214632A1 (en) | Smart Crowd Sourcing On Product Classification | |
US20220092446A1 (en) | Recommendation method, computing device and storage medium | |
US10853847B2 (en) | Methods and systems for near real-time lookalike audience expansion in ads targeting | |
US20210056458A1 (en) | Predicting a persona class based on overlap-agnostic machine learning models for distributing persona-based digital content | |
US20140214841A1 (en) | Semantic Product Classification | |
US11182840B2 (en) | Systems and methods for mapping a predicted entity to a product based on an online query | |
US10824608B2 (en) | Feature generation and storage in a multi-tenant environment | |
WO2022100518A1 (en) | User profile-based object recommendation method and device | |
US11538005B2 (en) | Long string pattern matching of aggregated account data | |
US20140214845A1 (en) | Product classification into product type families | |
US9582161B2 (en) | Configurable animated scatter plots | |
US20210311969A1 (en) | Automatically generating user segments | |
US20220207349A1 (en) | Automated Creation of Machine-learning Modeling Pipelines | |
US10915557B2 (en) | Product classification data transfer and management | |
US11948094B2 (en) | Selecting attributes by progressive sampling to generate digital predictive models | |
US20160180266A1 (en) | Using social media for improving supply chain performance | |
US20140214844A1 (en) | Multiple classification models in a pipeline | |
US20160253733A1 (en) | System, method, and non-transitory computer-readable storage media for displaying product information on websites | |
US9843643B2 (en) | System, method, and non-transitory computer-readable storage media for monitoring consumer activity on websites | |
CN110796520A (en) | Commodity recommendation method and device, computing equipment and medium | |
US10096045B2 (en) | Tying objective ratings to online items | |
CN108985805B (en) | Method and device for selectively executing push task | |
CN113159877B (en) | Data processing method, device, system and computer readable storage medium | |
US11113741B2 (en) | Arranging content on a user interface of a computing device | |
US10235687B1 (en) | Shortest distance to store |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: WAL-MART STORES, INC., ARKANSAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GARERA, NIKESH LUCKY;RAMPALLI, NARASIMHAN;RAVIKANT, DINTYALA VENKATA SUBRAHMANYA;AND OTHERS;SIGNING DATES FROM 20130130 TO 20130327;REEL/FRAME:030097/0871 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: WAL-MART STORES, INC., ARKANSAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GARERA, NIKESH LUCKY;RAMPALLI, NARASIMHAN;RAVIKANT, DINTYALA VENKATA SUBRAHMANYA;AND OTHERS;SIGNING DATES FROM 20130130 TO 20130327;REEL/FRAME:037259/0376 |
|
AS | Assignment |
Owner name: WALMART APOLLO, LLC, ARKANSAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WAL-MART STORES, INC.;REEL/FRAME:045817/0115 Effective date: 20180131 |