Many people ask what Big Data is. A fashionable technical term, a symbol of the Internet era. Give a person a high-end feeling, many companies do not say they have big data are embarrassed to talk about business with others. Then I'll talk about some of my views to the number of big data products combined with the AliCloud more vulgar introduction to the big data in the end is what.
First of all, from the literal meaning of “big data”, that can be felt, the amount of data to be large, which is a necessary condition. For example, let's say I want to do a report based on some sales data to the boss, then thousands of data or even tens of thousands of data, I can use excel can be very easy to do out, the data can be saved on their computers.
Then, when the data hundreds of thousands to millions, excel can not be done, you need to put the data in the database, such as common relational databases such as mysql, to calculate the storage, etc., but when the amount of data tens of millions of billions of billions of billions of dollars or more, how to do? That requires a large number of servers and higher configuration machines to store, calculate, make out the report, which requires big data technology to achieve. Therefore, the fundamental condition of big data is the amount of data to be large enough.
Said big data, we have to say the concept of distributed. Or that do report example, the amount of data is small, put in an excel table, on your own in a computer to store, the amount of data is large, put in a or independent on several servers, and then can not increase the configuration, install a mysql to manage, but if the amount of data is very large, every day dozens of Tb to come over or every day from dozens of T data in the data to extract some data to make some indicators to the boss, a server is certainly not dry, if the data are stored separately and relatively independent to several servers will certainly affect the overall effect of the data, such as the order of the data. Then how to do it, is it possible to use a certain way or a certain software to connect many servers together so that they work like a server, the answer is yes. This is the distributed system, the storage and computation is distributed to each server to perform, the overall seems to be like one in the execution, pooling the resources of multiple servers. This approach or software again or structure, is the basis of AliCloud, we have a loud and dreamy name: flying system. Then I will introduce the following around the data to the company as the center, in accordance with the data from where to come, come how to do, where to go this line to say the relevant concepts.
First, where does the data come from, how to
All business data. For example, send a courier, the beginning of everyone to send something to fill out the courier, handwritten, these are your information, name, phone number, address, etc., these data are collected, it becomes the most primitive data, for example, you are shopping in the mall casually connected to other people's free wifi, sorry about your information, from which door to come in, out of the door, in which store stayed for how long and other information is collected (said I will not be). Looking for tea), and then for example, we like to catch up with the drama, what page you watched what TV what type of how long you watched, commented on what, what cell phone used to collect become raw data.
That if the user is very much, the data generated is bound to be very large. How to collect it, with what technology, then, for example, AliCloud products - logging services and other products, of course, there are other AliCloud products!
Official document: https://help.aliyun.com/product/28958.html?spm=5176.7618386.3.2.L5fXeB
Second, what to do when the data come
All data business. How to do is to say, such a large amount of data, how to store, how to do the calculation, how to make the function or into what kind of product.
First of all, such a large amount of data can be placed on the fly to store, calculate and so on. Then the storage and calculation is divided into many kinds, for example, you have a farm, the farm has a lot of warehouses, warehouses can put all kinds of things, such as wheat, you can put in the wheat bucket, you can also be loaded in the bag stacked there, you can also pour into the warehouse on the line. You can screen these wheat statistics, etc., then wheat is data, this is the data warehouse, the farm on this warehouse, we can correspond to the Ali Cloud's big data tool ODPS is now called: Maxcompute.
Official document: https://help.aliyun.com/document_detail/27800.html?spm=5176.7740343.6.539.HfFlWv
Then the user can put the massive data into the ODPS for storage, computation, and interaction with other data sources and so on. Similar to the wheat you can screen, remove impurities, and select high-quality wheat, that is corresponding to the big data terminology, data cleaning, filtering. To this point it seems to meet the basic needs, if I have a lot of places in the wheat to be loaded into the warehouse, I have to screen a lot of wheat in the warehouse, I also want to filter out the quality of the wheat sent to scientific research units to do experiments. The problem is, who first loaded into the warehouse, who first screened or screened at the same time, screening progress, or I would like to wait until the first warehouse screening is completed and then carry out the screening of the second warehouse, I need someone to carry out the scheduling command, screening is completed and sent to other places, and I want to make the whole process transparent, intelligent. How to do?
We packaged odps and integrated some other functions to make the operation become visualized, usable and easy to use!
Official documentation: https://help.aliyun.com/document_detail/30256.html?spm=5176.7843912.6.539.SfmCgC
Users can operate odps through a visualization tool, the Big Data Development Kit, and another important function is data synchronization to transport wheat to other places. The development kit can be configured to synchronize to various databases such as rds, ads, etc., can be timed, can set the dependency and period of the task, and can be alerted and so on. And the important thing is that it is currently free to use.
Topic back to the beginning of the wheat storage, if the wheat is constantly shipped over, from the ground to harvest a handful of wheat immediately through some kind of flow, such as conveyor belt to the research unit, and the transmission process should be carried out in the wheat screening, identification and other operations. What should we do? Is there any real-time channel, streaming conveyor belt with screening function? First of all, the method of harvesting wheat collection, we have a kind of, that is, the above log service, can be harvested, there is an important issue can not be ignored, that is, the harvesting speed is very fast, I behind the conveyor belt on the screening tool screening capacity is not enough, the wheat will be piled up, then there is no kind of tool, you can collect a steady stream of wheat temporarily stored there, behind the flow of conveyor belt screening how much to take from there. How much is taken from there after screening on the streaming conveyor belt? There is such a tool, it is datahub-real-time data channel, you can upload the logs in real time through the logging service temporary storage, that conveyor belt streaming real-time screener is what - AliCloud Streaming Computing
https://help.aliyun.com/video_detail/55154.html
datahub works seamlessly with streaming computing. Streaming computing can take data from datahub to do real-time calculation and analysis.
The topic again back to the storage of wheat, for example, I have some wheat need to be very fast screening, calculation of wheat various indicators, such as the proportion of varieties, note that the screening is fast, the leadership will come to inspect at any time, with the kind of check. We can store the data in ADS, in the blink of an eye, hundreds of billions of data query at will:
Official Documentation https://help.aliyun.com/product/26371.html?spm=5176.7618386.3.8.NgtbKi
Speaking of data storage, for example, I'm not only storing wheat in the farm, I also have some fertilizers and pesticides and gasoline and what not so regular substances how to do it
Official document: https://help.aliyun.com/product/31815.html?spm=5176.7618386.3.2.d755W7
oss can store unstructured data, such as audio, video, pictures, etc. and provide fast access to the interface, of course, log data can also be stored. Then Maxcompute can not store these data, the requirement is to be structured, but Maxcompute2.0 can be connected to oss indirectly processing unstructured data.
Third, where does the data go
Since the leader came to inspect, to see the various indicators of wheat, you are not ready to excel how to do, please use the AliCloud product: Quick BI
Official document: https://help.aliyun.com/document_detail/33813.html?spm=5176.doc53448.6.539.bPiG2B
Like using excel to manipulate massive data reports, I'm afraid of myself.
Then, if you still want to do a bull ppt to the boss, or do an animation to see the wheat production in each location on the map, the whole double 11 kind of big screen to the boss, fortunately, datav can meet the
Official document: https://help.aliyun.com/document_detail/44253.html
The leader finished the inspection and made important instructions:
- I hope that the place can be based on the previous year's use of fertilizers, pesticides, planting of geographical areas, sowing time and other factors to consider the designation of the best harvest plan
- I hope that the wheat can be classified, clustered, to be able to do the following and then wheat into the warehouse can automatically identify it is the kind of category!
The leader's instructions do not dare to neglect, machine learning to help you with algorithms to get it done!
Official document: https://help.aliyun.com/product/30347.html?spm=5176.7618386.3.2.sGxA27
Machine learning, is a term, in layman's terms, is the hope that the machine through the algorithmic program to achieve the ability to learn like humans, after learning experience, grow up, will be able to distinguish between right and wrong. This kind of discipline evolved into a professional discipline, not that let the machine will learn. It is a technical discipline from a technical point of view. It is a multi-disciplinary cross-discipline involving probability theory, statistics, approximation theory, convex analysis, algorithmic complexity theory, and many other disciplines. Specializes in how computers simulate or implement human learning behavior to acquire new knowledge or skills, reorganize the existing knowledge structure so that it continuously improves its own performance.
Finally, to give another example, on Taobao to buy clothes, found that after buying the top, in the following will give you recommendations for shoes, pants or similar styles of clothing, if you often buy clothes, will often give you recommendations for the relevance of something very strong. Another example, brush microblogging, will be based on you often like to click to see which video, give you the same type of video recommendations. This is implemented by algorithms, specifically recommendation algorithms, which are part of the machine learning discipline. How to use recommendations?
Official documentation: https://help.aliyun.com/product/30367.html?spm=5176.7618386.3.2.sgyFWM
For example, microblogging, the number of users is huge, I want to recommend to each user, a recommendation algorithm implementation runs need to calculate the massive amount of data behind, that is - big data. So, machine learning, recommendation algorithms are based on big data technology. AliCloud machine learning, recommendation engine is based on the massive storage and computing power of Maxcompute. To a larger point, in fact, these machine learning disciplines have long been there, but because there is no strong big data technology support slow development, in recent years, with the development of big data technology and server breakthroughs in memory and cpu can be widely used, and promote the development of artificial intelligence.
Summary: Big data technology is not unique to Aliyun, but Aliyun has turned it into a universal service and platform to provide users. At present, all kinds of business units put their data on the cloud, and all kinds of flowers are trusted. Security and stability is the first rule, so if the cloud is safe and sound, it is a sunny day.
If I have to summarize cloud computing in one sentence, it is: under the colorful clouds, everything is connected.
There are interested in big data technology, you can add the author's WeChat wx4085116. At present, the author has left from Ali, the blog does not represent the position of Ali. The author opened a big data training course. Interested in adding me.