convore.json/groups/django-community/django-and-spawning-high-cpu-high-memory-operations/messages.json


			
				
					
					
						
						
							
							
							[{"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494447.598135, "message": "One of the possibilities a colleague suggested was to use XMLRPC to call a standalone python process running the calculations, but I would like to avoid the serialization to XML if I can.", "group_id": 81, "id": 191026}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494547.7321689, "message": "@estebistec. That was my first idea but was not well received because we really don't need to handle many requests", "group_id": 81, "id": 191050}, {"user_id": 13467, "stars": [], "topic_id": 8447, "date_created": 1298494775.7555311, "message": "A simpler architecture might involve something like: 1) from the web app, write files to a directory with the jobs (eg. a job spool); 2) from a persistent script or cronjob, pick up the files and do processing, hit the web app with an HTTP POST when the job is done", "group_id": 81, "id": 191093}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494832.2694671, "message": "@lmorchard A cron script like what django-mailer uses was my second thought too", "group_id": 81, "id": 191108}, {"user_id": 13467, "stars": [], "topic_id": 8447, "date_created": 1298494941.602097, "message": "Or Gearman, at least", "group_id": 81, "id": 191147}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494244.228374, "message": "I am using Django to expose an HTTP resource oriented API for scientific calculations.", "group_id": 81, "id": 190991}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495580.8682539, "message": "been there", "group_id": 81, "id": 191256}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494620.288512, "message": "Here is the link I showed them that scared them and made them ask me for something more simple: http://openquake.org/documentation/architecture/", "group_id": 81, "id": 191067}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494845.5776441, "message": "i.e. ./manage.py runcalculations", "group_id": 81, "id": 191114}, {"user_id": 13467, "stars": [], "topic_id": 8447, "date_created": 1298494881.6323371, "message": "Yeah, basically the same idea, though maybe swapping a folder of files for a DB table of model rows representing jobs.", "group_id": 81, "id": 191129}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495830.7249401, "message": ":) thanks a lot  for your input @convoronauts", "group_id": 81, "id": 191293}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494470.516603, "message": "To be more specific, the calculations would be on raster and vector geospatial data already downloaded to the server where Django lives.", "group_id": 81, "id": 191034}, {"user_id": 13467, "stars": [{"date_created": 1298495174.0718379, "user_id": 15292}], "topic_id": 8447, "date_created": 1298494800.6576011, "message": "But, keep in mind that that \"simpler\" architecture will get annoying fast, and you'll want something like Celery before you go to far with it.", "group_id": 81, "id": 191100}, {"user_id": 12688, "stars": [], "topic_id": 8447, "date_created": 1298494965.593591, "message": "You could always keep your data in the cloud (s3 / simpledb) and use something like picloud to do data processing.  probably not a very helpful answer :)", "group_id": 81, "id": 191152}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495555.5176909, "message": "yeah, in that case I'd go straight for celery (sans mq)", "group_id": 81, "id": 191251}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495022.7114489, "message": "But unfortunately I might not be able to sell it to my colleagues before that", "group_id": 81, "id": 191160}, {"user_id": 13467, "stars": [], "topic_id": 8447, "date_created": 1298495105.8705349, "message": "Personally, I'd say there's nothing wrong with a cron script for now, as long as you recognize the technical debt you may need to pay with a migration to Celery or something like it in the future :)", "group_id": 81, "id": 191177}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494357.811265, "message": "I don't want to run those calculations inside the Python interpreter inside Apache's mod-wsgi, how can I start such a process from Django?", "group_id": 81, "id": 191009}, {"user_id": 15292, "stars": [{"date_created": 1298495426.7788501, "user_id": 13467}, {"date_created": 1298497516.145422, "user_id": 1736}], "topic_id": 8447, "date_created": 1298494508.7091801, "message": "Spawn tasks to celery? http://celeryproject.org/", "group_id": 81, "id": 191041}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494575.797142, "message": "Making an arquitecture with RabbitMQ, a task queue, etc seem like an overkill.", "group_id": 81, "id": 191058}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494781.7731941, "message": "While one calculation may need as much as a gigabyte to run and max out the CPU we will probably just have one or two going on at the same time. And something less than 5 requests per minute.", "group_id": 81, "id": 191094}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494801.944555, "message": "(a calculation can also last several minutes)", "group_id": 81, "id": 191101}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494999.6986239, "message": "My intention is to get celery in the mix as soon as we start needing to serve many more requests and offloading workers to other machines", "group_id": 81, "id": 191155}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495054.7265191, "message": "BTW I forgot to link my API http://ingenieroariel.com/static/riab )", "group_id": 81, "id": 191167}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495366.57393, "message": "it was the best possible way for me to go. In an earlier project, I set it up, but it wasn't a good fit at all.  Just be clear as to what it is you need to achieve.", "group_id": 81, "id": 191233}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495644.0106571, "message": "/me is going to point his colleagues at this thread", "group_id": 81, "id": 191264}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495693.8874209, "message": "(unfortunately they are all asleep right now in the other side of the world)", "group_id": 81, "id": 191270}, {"user_id": 214, "stars": [{"date_created": 1298495214.395952, "user_id": 13467}, {"date_created": 1298495221.2166369, "user_id": 12312}], "topic_id": 8447, "date_created": 1298495180.630172, "message": "You can use celery with a database backend rather than RabbitMQ. I think this approach is the best balance of \"easy to get going\" with \"not making your life difficult later.\" See http://ask.github.com/celery/tutorials/otherqueues.html", "group_id": 81, "id": 191199}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495501.2332349, "message": "so, you're literally just looking to detach (poor choice of word I guess) from the web process - async task in celery-speak?", "group_id": 81, "id": 191245}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495518.1128449, "message": "Breaking the geotiffs into small tiles to be processed independently will surely be done some time in the future though", "group_id": 81, "id": 191248}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495525.030334, "message": "Yes @theomn", "group_id": 81, "id": 191249}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495569.3707571, "message": "exactly that, I worry about running code with custom built c++ extensions in the mod_wsgi interpreter", "group_id": 81, "id": 191252}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495585.2633419, "message": "because threading issues have been a big problem when using Geos in GeoDjango", "group_id": 81, "id": 191258}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298494880.0040591, "message": "but I got a blank stare, asking me why I had to do polling instead of having it event driven", "group_id": 81, "id": 191127}, {"user_id": 13467, "stars": [], "topic_id": 8447, "date_created": 1298494908.9429541, "message": "Event driven leads you to something like Celery, IMO", "group_id": 81, "id": 191139}, {"user_id": 13467, "stars": [], "topic_id": 8447, "date_created": 1298495160.9220469, "message": "But going into tech debt on queues now may mean you get the actual functionality sooner than later and sort out other more interesting details first", "group_id": 81, "id": 191188}, {"user_id": 5981, "stars": [{"date_created": 1298495223.4676991, "user_id": 13467}], "topic_id": 8447, "date_created": 1298495160.822165, "message": "but if you write your cron as a management command, adding celery later will not be a huge undertaking", "group_id": 81, "id": 191187}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495239.7184081, "message": "absolutely agree, @carljm", "group_id": 81, "id": 191217}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495437.9146979, "message": "Yeah, right now I don't need to optimize (i.e. it's okay if the calculation takes a few minutes and a lot of RAM by using very big matrices)", "group_id": 81, "id": 191239}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495581.8940361, "message": "haha", "group_id": 81, "id": 191257}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495781.7753439, "message": "well, at least this will be here when they wake up tomorrow", "group_id": 81, "id": 191280}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495595.8623841, "message": "actually yes, I was running image analysis with opencv", "group_id": 81, "id": 191259}, {"user_id": 12312, "stars": [], "topic_id": 8447, "date_created": 1298495219.3353479, "message": "@carljm Getting RabbitMQ out of the picture for now may help me a lot (/me reads the link)", "group_id": 81, "id": 191212}, {"user_id": 5981, "stars": [], "topic_id": 8447, "date_created": 1298495308.8123341, "message": "as a side note, for me, celery was my \"in\" for breaking a large cpu bound operation into smaller parts that could be processed concurrently.", "group_id": 81, "id": 191229}, {"user_id": 2588, "stars": [], "topic_id": 8447, "date_created": 1298499988.6181359, "message": "One trade-off of using just Redis is that there is potential for data loss in some configurations if the server dies w/o recent operations having been persisted to disk. Sounds like the risk would be low for your current needs but it might be a good idea to keep logs of jobs requested and jobs completed as a precaution.", "group_id": 81, "id": 192135}, {"user_id": 2588, "stars": [], "topic_id": 8447, "date_created": 1298499620.332957, "message": "Something not yet mentioned is that Redis can also be used as the MQ for Celery. Of course, that still adds an additional dependency in addition to Celery. That said, Redis itself can be used as a basic queue. The list type has a blocking pop operation meaning you can push items onto a list from Django and your computation process would just call BLPOP in a loop, waiting if no jobs are available. Redis also has publish/subsribe channels.", "group_id": 81, "id": 192059}, {"user_id": 3748, "stars": [{"date_created": 1298509279.2050109, "user_id": 1081}], "topic_id": 8447, "date_created": 1298506160.8182981, "message": "IMHO celery is not complex, you can even use another backend instead of rabbitmq, ie. redis or even sqlite.", "group_id": 81, "id": 193696}, {"user_id": 12577, "stars": [], "topic_id": 8447, "date_created": 1298512025.503654, "message": "write the requests to files and have a cron job written in python pick them up and process them", "group_id": 81, "id": 194779}, {"user_id": 12577, "stars": [], "topic_id": 8447, "date_created": 1298512068.7840171, "message": "you could have lighter serialization format than xml (though I don't think of xml as that onerous tbh).", "group_id": 81, "id": 194783}, {"user_id": 12577, "stars": [], "topic_id": 8447, "date_created": 1298512038.128701, "message": "potentially alter the files to add the processing state and read that back", "group_id": 81, "id": 194782}, {"user_id": 1736, "stars": [{"date_created": 1298516789.5879049, "user_id": 12312}, {"date_created": 1298517210.152745, "user_id": 275}, {"date_created": 1298637373.6733789, "user_id": 257}], "topic_id": 8447, "date_created": 1298512284.9824059, "message": "@nicferrier Any time you think to yourself \"I should use cron to fire off Django code\" you should stop and install Celery.", "group_id": 81, "id": 194806}, {"user_id": 12577, "stars": [], "topic_id": 8447, "date_created": 1298559505.138993, "message": "1. I disagree, we do it and it's fine. 2. I didn't say that, I said fire off some python. doesn't have to be django.", "group_id": 81, "id": 198163}, {"user_id": 13467, "stars": [], "topic_id": 8447, "date_created": 1298559892.9981289, "message": "We talked a bit earlier in this thread about jobs as files processed via cron. It's an option, but if you find yourself needing to distribute the jobs to multiple machines or expand in other ways, you'll eventually want something like Celery.", "group_id": 81, "id": 198209}]